Publications

Here, you can find a list of my publications. A more up-to-date list can usually be found on my Google Scholar page.

Edward Raff, Michel Benaroch, Sagar Samtani and Andrew L Farris. What Do Machine Learning Researchers Mean by “Reproducible”?. In The Thirty-Ninth AAAI Conference on Artificial Intelligence. 2025. URL BibTeX

@inproceedings{RaffReproSurvey,
	author = "Raff, Edward and Benaroch, Michel and Samtani, Sagar and Farris, Andrew L.",
	booktitle = "The Thirty-Ninth AAAI Conference on Artificial Intelligence",
	eprint = "2012.09390",
	title = "What Do Machine Learning Researchers Mean by ``Reproducible''?",
	url = "https://arxiv.org/abs/2412.03854",
	year = 2025
}

John Hurwitz, Charles Nicholas and Edward Raff. Neural Normalized Compression Distance and the Disconnect Between Compression and Classification. In Machine Learning and Compression Workshop at NeurIPS 2024. December 2024. URL BibTeX

@inproceedings{ HurwitzNCDwierd,
	title = "Neural Normalized Compression Distance and the Disconnect Between Compression and Classification",
	author = "John Hurwitz and Charles Nicholas and Edward Raff",
	booktitle = "Machine Learning and Compression Workshop at NeurIPS 2024",
	year = 2024,
	month = "December",
	url = "https://arxiv.org/abs/2410.15280"
}

Amol Khanna, Adam McCormick, Andre Nguyen, Chris Aguirre and Edward Raff. Position: Challenges and Opportunities for Differential Privacy in the US Federal Government. In 2nd Workshop on Regulatable ML at NeurIPS 2024. December 2024. URL BibTeX

@inproceedings{ KhannaDPPosition,
	title = "Position: Challenges and Opportunities for Differential Privacy in the US Federal Government",
	author = "Amol Khanna and Adam McCormick and Andre Nguyen and Chris Aguirre and Edward Raff",
	booktitle = "2nd Workshop on Regulatable ML at NeurIPS 2024",
	year = 2024,
	month = "December",
	url = "https://arxiv.org/abs/2410.16423"
}

Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt and Kristopher Micinski. Assemblage: Automatic Binary Dataset Construction for Machine Learning. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. December 2024. URL BibTeX

@inproceedings{ liu2024assemblage,
	title = "Assemblage: Automatic Binary Dataset Construction for Machine Learning",
	author = "Chang Liu and Rebecca Saul and Yihao Sun and Edward Raff and Maya Fuchs and Townsend Southard Pantano and James Holt and Kristopher Micinski",
	booktitle = "The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track",
	year = 2024,
	month = "December",
	url = "https://openreview.net/forum?id=dsK5EmmomU"
}

Rebecca Saul, Chang Liu, Noah Fleischmann, Richard J Zak, Kristopher Micinski, Edward Raff and James Holt. Is Function Similarity Over-Engineered? Building a Benchmark. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. December 2024. URL BibTeX

@inproceedings{ saul2024is,
	title = "Is Function Similarity Over-Engineered? Building a Benchmark",
	author = "Rebecca Saul and Chang Liu and Noah Fleischmann and Richard J Zak and Kristopher Micinski and Edward Raff and James Holt",
	booktitle = "The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track",
	year = 2024,
	month = "December",
	url = "https://openreview.net/forum?id=LOcLhezm1C"
}

Mohammad Mahmudul Alam, Alexander Oberle, Edward Raff, Stella Biderman, Tim Oates and James Holt. A Walsh Hadamard Derived Linear Vector Symbolic Architecture. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. December 2024. URL BibTeX

@inproceedings{ alam2024a,
	title = "A Walsh Hadamard Derived Linear Vector Symbolic Architecture",
	author = "Mohammad Mahmudul Alam and Alexander Oberle and Edward Raff and Stella Biderman and Tim Oates and James Holt",
	booktitle = "The Thirty-eighth Annual Conference on Neural Information Processing Systems",
	year = 2024,
	month = "December",
	url = "https://openreview.net/forum?id=p3hNrpeWMe"
}

Skyler Wu, Fred Lu, Edward Raff and James Holt. Stabilizing Linear Passive-Aggressive Online Learning with Weighted Reservoir Sampling. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. December 2024. URL BibTeX

@inproceedings{ wu2024stabilizing,
	title = "Stabilizing Linear Passive-Aggressive Online Learning with Weighted Reservoir Sampling",
	author = "Skyler Wu and Fred Lu and Edward Raff and James Holt",
	booktitle = "The Thirty-eighth Annual Conference on Neural Information Processing Systems",
	year = 2024,
	month = "December",
	url = "https://openreview.net/forum?id=FNOBf6JM7r"
}

James Holt and Edward Raff. Malware Bytes. In The Next Wave: Cyber Analytics Research 25. April 2024. URL PDF BibTeX

@inproceedings{malwareBytes,
	title = "Malware Bytes",
	author = "Holt, James and Raff, Edward",
	booktitle = "The Next Wave: Cyber Analytics Research",
	year = 2024,
	volume = 25,
	issue = 1,
	month = "April",
	publisher = "National Security Agency (NSA)",
	url = "https://www.govinfo.gov/app/details/GPO-TNW-25-1-2024/GPO-TNW-25-1-2024-6",
	pdf = "https://www.govinfo.gov/content/pkg/GPO-TNW-25-1-2024/pdf/GPO-TNW-25-1-2024-6.pdf"
}

Amol Khanna, Edward Raff and Nathan Inkawhich. SoK: A Review of Differentially Private Linear Models For High-Dimensional Data. In 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (). April 2024, 57-77. URL, DOI BibTeX

@inproceedings{10516654,
	author = "Khanna, Amol and Raff, Edward and Inkawhich, Nathan",
	booktitle = "2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)",
	title = "SoK: A Review of Differentially Private Linear Models For High-Dimensional Data",
	year = 2024,
	volume = "",
	number = "",
	pages = "57-77",
	abstract = "Linear models are ubiquitous in data science, but are particularly prone to overfitting and data memorization in high dimensions. To guarantee the privacy of training data, differential privacy can be used. Many papers have proposed optimization techniques for high-dimensional differentially private linear models, but a systematic comparison between these methods does not exist. We close this gap by providing a comprehensive review of optimization methods for private high-dimensional linear models. Empirical tests on all methods demonstrate robust and coordinate-optimized algorithms perform best, which can inform future research. Code for implementing all methods is released online.",
	keywords = "Systematics;Heavily-tailed distribution;Codes;Reviews;Neural networks;Optimization methods;Training data;differential privacy;high-dimensional;linear regression;logistic regression",
	doi = "10.1109/SaTML59370.2024.00012",
	issn = "",
	month = "April",
	url = "https://arxiv.org/abs/2404.01141"
}

Sagar Samtani, Edward Raff and Hyrum Anderson. Applied Machine Learning for Information Security. Digital Threats 5(1), April 2024. URL, DOI BibTeX

@article{10.1145/3652029,
	author = "Samtani, Sagar and Raff, Edward and Anderson, Hyrum",
	title = "Applied Machine Learning for Information Security",
	year = 2024,
	issue_date = "March 2024",
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	volume = 5,
	number = 1,
	url = "https://doi.org/10.1145/3652029",
	doi = "10.1145/3652029",
	abstract = "Information security has undoubtedly become a critical aspect of modern cybersecurity practices. Over the past half-decade, numerous academic and industry groups have sought to develop machine learning, deep learning, and other areas of artificial intelligence-enabled analytics into information security practices. The Conference on Applied Machine Learning (CAMLIS) is an emerging venue that seeks to gather researchers and practitioners to discuss applied and fundamental research on machine learning for information security applications. In 2021, CAMLIS partnered with ACM Digital Threats: Research and Practice (DTRAP) to provide opportunities for authors of accepted CAMLIS papers to submit their research for consideration into ACM DTRAP via a Special Issue on Applied Machine Learning for Information Security. This editorial summarizes the results of this Special Issue.",
	journal = "Digital Threats",
	month = "apr",
	articleno = 1,
	numpages = 5,
	keywords = "Applied machine learning, deep learning, artificial intelligence, information security, cybersecurity"
}

Ethan M Rudd, David Krisiloff, Scott Coull, Daniel Olszewski, Edward Raff and James Holt. Efficient Malware Analysis Using Metric Embeddings. Digital Threats 5(1), March 2024. URL, DOI BibTeX

@article{10.1145/3615669,
	author = "Rudd, Ethan M. and Krisiloff, David and Coull, Scott and Olszewski, Daniel and Raff, Edward and Holt, James",
	title = "Efficient Malware Analysis Using Metric Embeddings",
	year = 2024,
	issue_date = "March 2024",
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	volume = 5,
	number = 1,
	url = "https://doi.org/10.1145/3615669",
	doi = "10.1145/3615669",
	abstract = "Real-world malware analysis consists of a complex pipeline of classifiers and data analysis—from detection to classification of capabilities to retrieval of unique training samples from user systems. In this article, we aim to reduce the complexity of these pipelines through the use of low-dimensional metric embeddings of Windows PE files, which can be used in a variety of downstream applications, including malware detection, family classification, and malware attribute tagging. Specifically, we enrich labeling of malicious and benign PE files with computationally-expensive, disassembly-based malicious capabilities information. Using this enhanced labeling, we derive several different types of efficient metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation, and combinations thereof. Our evaluation examines performance on a variety of transfer tasks performed on the EMBER and SOREL datasets, demonstrating that low-dimensional, computationally-efficient metric embeddings maintain performance with little decay. This offers the potential to quickly retrain for a variety of transfer tasks at significantly reduced overhead and complexity. We conclude with an examination of practical considerations for the use of our proposed embedding approach, such as robustness to adversarial evasion and introduction of task-specific auxiliary objectives to improve performance on mission critical tasks.",
	journal = "Digital Threats",
	month = "mar",
	articleno = 4,
	numpages = 20,
	keywords = "Metric embeddings, malware analysis, transfer learning, multi-objective learning, deep learning"
}

Mohammad Mahmudul Alam, Edward Raff, Stella R Biderman, Tim Oates and James Holt. Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics 238. 2024, 4042–4050. URL PDF BibTeX

@inproceedings{pmlr-v238-mahmudul-alam24a,
	title = "Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection",
	author = "Mahmudul Alam, Mohammad and Raff, Edward and R Biderman, Stella and Oates, Tim and Holt, James",
	booktitle = "Proceedings of The 27th International Conference on Artificial Intelligence and Statistics",
	pages = "4042--4050",
	year = 2024,
	volume = 238,
	series = "Proceedings of Machine Learning Research",
	month = "02--04 May",
	publisher = "PMLR",
	pdf = "https://proceedings.mlr.press/v238/mahmudul-alam24a/mahmudul-alam24a.pdf",
	url = "https://proceedings.mlr.press/v238/mahmudul-alam24a.html",
	abstract = "Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they’re not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements. Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. HGConv kernels are defined as simple parameters learned through backpropagation. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks. With log-linear complexity in sequence length, the empirical results demonstrate substantially faster run-time by HGConv compared to other methods achieving far more efficient scaling even with sequence length $\geq 100,000$."
}

Edward Raff and Cynthia Matuszek. Does Starting Deep Learning Homework Earlier Improve Grades?. pages 381–396, Springer Nature Switzerland, 2024. URL, DOI BibTeX

@inbook{Raff2024,
	title = "Does Starting Deep Learning Homework Earlier Improve Grades?",
	isbn = 9783031504853,
	issn = "1865-0937",
	url = "http://dx.doi.org/10.1007/978-3-031-50485-3_38",
	doi = "10.1007/978-3-031-50485-3_38",
	booktitle = "Artificial Intelligence. ECAI 2023 International Workshops",
	publisher = "Springer Nature Switzerland",
	author = "Raff, Edward and Matuszek, Cynthia",
	year = 2024,
	pages = "381–396"
}

Ashley Klein, Edward Raff, Elisabeth Seamon, Lily Foley and Timothy Bussert. More Options for Prelabor Rupture of Membranes, A Bayesian Analysis. In 11th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2024. 2024.
Best Paper Award. URL BibTeX

@inproceedings{Klein24,
	author = "Klein, Ashley and Raff, Edward and Seamon, Elisabeth and Foley, Lily and Bussert, Timothy",
	booktitle = "11th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2024",
	title = "{More Options for Prelabor Rupture of Membranes, A Bayesian Analysis}",
	url = "https://www.arxiv.org/abs/2408.10876",
	year = 2024,
	award = "Best Paper",
	note = "Best Paper Award"
}

Ryan Swope, Amol Khanna, Philip Doldo, Saptarshi Roy and Edward Raff. Feature Selection from Differentially Private Correlations. In Proceedings of the 17th ACM Workshop on Artificial Intelligence and Security (AISec'24). 2024. URL BibTeX

@inproceedings{Swope24,
	author = "Swope, Ryan and Khanna, Amol and Doldo, Philip and Roy, Saptarshi and Raff, Edward",
	booktitle = "Proceedings of the 17th ACM Workshop on Artificial Intelligence and Security (AISec'24)",
	title = "{Feature Selection from Differentially Private Correlations}",
	url = "https://arxiv.org/abs/2408.10862",
	year = 2024
}

Francis Ferraro Kasra Darvish Edward Raff and Cynthia Matuszek. Multimodal Language Learning for Object Retrieval in Low Data Regimes in the Face of Missing Modalities. Transactions on Machine Learning Research, October 2023. URL BibTeX

@article{Multimodal_Language_Learning_for_Object_Retrieval_in_Low_Data_Regimes_in_the_Face_of_Missing_Modalities,
	author = "Kasra Darvish, Edward Raff, Francis Ferraro, and Cynthia Matuszek",
	title = "{Multimodal Language Learning for Object Retrieval in Low Data Regimes in the Face of Missing Modalities}",
	month = "October",
	year = 2023,
	journal = "Transactions on Machine Learning Research",
	url = "https://openreview.net/forum?id=cXa6Xdm0v7"
}

Zheng Xin Yong, Hailey Schoelkopf, Niklas Muennighoff, Alham Fikri Aji, David Ifeoluwa Adelani, Khalid Almubarak, Saiful M Bari, Lintang Sutawika, Jungo Kasai, Ahmed Baruwa, Genta Winata, Stella Biderman, Edward Raff, Dragomir Radev and Vassilina Nikoulina. BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). July 2023, 11682–11703. URL, DOI BibTeX

@inproceedings{yong-etal-2023-bloom,
	title = "{BLOOM}+1: Adding Language Support to {BLOOM} for Zero-Shot Prompting",
	author = "Yong, Zheng Xin and Schoelkopf, Hailey and Muennighoff, Niklas and Aji, Alham Fikri and Adelani, David Ifeoluwa and Almubarak, Khalid and Bari, M Saiful and Sutawika, Lintang and Kasai, Jungo and Baruwa, Ahmed and Winata, Genta and Biderman, Stella and Raff, Edward and Radev, Dragomir and Nikoulina, Vassilina",
	booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
	month = "jul",
	year = 2023,
	address = "Toronto, Canada",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2023.acl-long.653",
	doi = "10.18653/v1/2023.acl-long.653",
	pages = "11682--11703",
	abstract = "The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at \url{https://github.com/bigscience-workshop/multilingual-modeling}."
}

Catherine Ordun, Edward Raff and Sanjay Purushotham. When Visible-to-Thermal Facial GAN Beats Conditional Diffusion. In 2023 IEEE International Conference on Image Processing (ICIP) (). 2023, 181-185. DOI BibTeX

@inproceedings{10223118,
	author = "Ordun, Catherine and Raff, Edward and Purushotham, Sanjay",
	booktitle = "2023 IEEE International Conference on Image Processing (ICIP)",
	title = "When Visible-to-Thermal Facial GAN Beats Conditional Diffusion",
	year = 2023,
	volume = "",
	number = "",
	pages = "181-185",
	doi = "10.1109/ICIP49359.2023.10223118"
}

Amol Khanna, Fred Lu and Edward Raff. The Challenge of Differentially Private Screening Rules. 2nd AdvML Frontiers Workshop at 40th International Conference on Machine Learning, 2023. URL BibTeX

@article{dpScreeningHard,
	author = "Amol Khanna and Fred Lu and Edward Raff",
	title = "The Challenge of Differentially Private Screening Rules",
	journal = "2nd AdvML Frontiers Workshop at 40th International Conference on Machine Learning",
	year = 2023,
	url = "https://arxiv.org/abs/2303.10303"
}

Robert J Joyce, Tirth Patel, Charles Nicholas and Edward Raff. AVScan2Vec: Feature Learning on Antivirus Scan Data for Production-Scale Malware Corpora. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. 2023, 185–196. URL, DOI BibTeX

@inproceedings{10.1145/3605764.3623907,
	author = "Joyce, Robert J. and Patel, Tirth and Nicholas, Charles and Raff, Edward",
	title = "AVScan2Vec: Feature Learning on Antivirus Scan Data for Production-Scale Malware Corpora",
	year = 2023,
	isbn = 9798400702600,
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	url = "https://doi.org/10.1145/3605764.3623907",
	doi = "10.1145/3605764.3623907",
	abstract = "When investigating a malicious file, searching for related files is a common task that malware analysts must perform. Given that production malware corpora may contain over a billion files and consume petabytes of storage, many feature extraction and similarity search approaches are computationally infeasible. Our work explores the potential of antivirus (AV) scan data as a scalable source of features for malware. This is possible because AV scan reports are widely available through services such as VirusTotal and are ~100x smaller than the average malware sample. The information within an AV scan report is abundant with information and can indicate a malicious file's family, behavior, target operating system, and many other characteristics. We introduce AVScan2Vec, a neural model trained to comprehend the semantics of AV scan data. AVScan2Vec ingests AV scan data for a malicious file and outputs a meaningful vector representation. AVScan2Vec vectors are ~3 to 85x smaller than popular alternatives in use today, enabling faster vector comparisons and lower memory usage. By incorporating Dynamic Continuous Indexing, we show that nearest-neighbor queries on AVScan2Vec vectors can scale to even the largest malware production datasets. We also demonstrate that AVScan2Vec vectors are superior to other leading malware feature vector representations across nearly all classification, clustering, and nearest-neighbor lookup algorithms that we evaluated.",
	booktitle = "Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security",
	pages = "185–196",
	numpages = 12,
	keywords = "malware, antivirus, feature learning",
	series = "AISec '23"
}

Catherine Ordun, Edward Raff and Sanjay Purushotham. Vista Morph - Unsupervised Image Registration of Visible-Thermal Facial Pairs. In 2023 IEEE International Joint Conference on Biometrics (IJCB) (). 2023, 1-10. URL, DOI BibTeX

@inproceedings{10448887,
	author = "Ordun, Catherine and Raff, Edward and Purushotham, Sanjay",
	booktitle = "2023 IEEE International Joint Conference on Biometrics (IJCB)",
	title = "Vista Morph - Unsupervised Image Registration of Visible-Thermal Facial Pairs",
	year = 2023,
	volume = "",
	url = "https://arxiv.org/abs/2306.06505",
	number = "",
	pages = "1-10",
	keywords = "Image registration;Visualization;Generative AI;Biometrics (access control);Generative adversarial networks;Transformers;Task analysis",
	doi = "10.1109/IJCB57857.2023.10448887"
}

Edward Raff and Andrew L Farris. A Siren Song of Open Source Reproducibility, Examples from Machine Learning. In Proceedings of the 2023 ACM Conference on Reproducibility and Replicability. 2023, 115–120. URL, DOI BibTeX

@inproceedings{10.1145/3589806.3600042,
	author = "Raff, Edward and Farris, Andrew L.",
	title = "A Siren Song of Open Source Reproducibility, Examples from Machine Learning",
	year = 2023,
	isbn = 9798400701764,
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	url = "https://doi.org/10.1145/3589806.3600042",
	doi = "10.1145/3589806.3600042",
	abstract = "As reproducibility becomes a greater concern, conferences have largely converged to a strategy of asking reviewers to indicate whether code was attached to a submission. This represents a broader pattern of implementing actions based on presumed ideals, without studying whether those actions will produce positive results. We argue that focusing on code as a means of reproduction is misguided if we want to improve the state of reproducible and replicable research. In this study, we find this focus on code may be harmful — we should not force code to be submitted. Furthermore, there is a lack of evidence that conferences take effective actions to encourage and reward reproducibility. We argue that venues must take more action to advance reproducible machine learning research today.",
	booktitle = "Proceedings of the 2023 ACM Conference on Reproducibility and Replicability",
	pages = "115–120",
	numpages = 6,
	location = "Santa Cruz, CA, USA",
	series = "ACM REP '23"
}

Tyler LeBlond, Joseph Munoz, Fred Lu, Maya Fuchs, Elliot Zaresky-Williams, Edward Raff and Brian Testa. Probing the Transition to Dataset-Level Privacy in ML Models Using an Output-Specific and Data-Resolved Privacy Profile. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. 2023, 23–33. URL, DOI BibTeX

@inproceedings{10.1145/3605764.3623904,
	author = "LeBlond, Tyler and Munoz, Joseph and Lu, Fred and Fuchs, Maya and Zaresky-Williams, Elliot and Raff, Edward and Testa, Brian",
	title = "Probing the Transition to Dataset-Level Privacy in ML Models Using an Output-Specific and Data-Resolved Privacy Profile",
	year = 2023,
	isbn = 9798400702600,
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	url = "https://doi.org/10.1145/3605764.3623904",
	doi = "10.1145/3605764.3623904",
	abstract = "Differential privacy (DP) is the prevailing technique for protecting user data in machine learning models. However, deficits to this framework include a lack of clarity for selecting the privacy budget ε and a lack of quantification for the privacy leakage for a particular data row by a particular trained model. We make progress toward these limitations and a new perspective by which to visualize DP results by studying a privacy metric that quantifies the extent to which a model trained on a dataset using a DP mechanism is ''covered'' by each of the distributions resulting from training on neighboring datasets. We connect this coverage metric to what has been established in the literature and use it to rank the privacy of individual samples from the training set in what we call a privacy profile. We additionally show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as ε decreases, which we suggest is a tool that can enable the selection of ε by the ML practitioner wishing to make use of DP.",
	booktitle = "Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security",
	pages = "23–33",
	numpages = 11,
	keywords = "differential privacy, machine unlearning",
	location = ", Copenhagen, Denmark, ",
	series = "AISec '23"
}

Catherine Ordun, Alexandra Cha, Edward Raff, Sanjay Purushotham, Karen Kwok, Mason Rule and James Gulley. A Generative Approach for Image Registration of Visible-Thermal (VT) Cancer Faces. pages 91–100, Springer Nature Switzerland, 2023. URL, DOI BibTeX

@inbook{Ordun2023,
	title = "A Generative Approach for Image Registration of Visible-Thermal (VT) Cancer Faces",
	isbn = 9783031445118,
	issn = "1611-3349",
	url = "http://dx.doi.org/10.1007/978-3-031-44511-8_7",
	doi = "10.1007/978-3-031-44511-8_7",
	booktitle = "Lecture Notes in Computer Science",
	publisher = "Springer Nature Switzerland",
	author = "Ordun, Catherine and Cha, Alexandra and Raff, Edward and Purushotham, Sanjay and Kwok, Karen and Rule, Mason and Gulley, James",
	year = 2023,
	pages = "91–100"
}

Corey J Nolet, Divye Gala, Alex Fender, Mahesh Doijade, Joe Eaton, Edward Raff, John Zedlewski, Brad Rees and Tim Oates. cuSLINK: Single-Linkage Agglomerative Clustering on the GPU. pages 711–726, Springer Nature Switzerland, 2023. URL, DOI BibTeX

@inbook{Nolet2023,
	title = "cuSLINK: Single-Linkage Agglomerative Clustering on the GPU",
	isbn = 9783031434129,
	issn = "1611-3349",
	url = "http://dx.doi.org/10.1007/978-3-031-43412-9_42",
	doi = "10.1007/978-3-031-43412-9_42",
	booktitle = "Lecture Notes in Computer Science",
	publisher = "Springer Nature Switzerland",
	author = "Nolet, Corey J. and Gala, Divye and Fender, Alex and Doijade, Mahesh and Eaton, Joe and Raff, Edward and Zedlewski, John and Rees, Brad and Oates, Tim",
	year = 2023,
	pages = "711–726"
}

Edward Raff, Mark McLean and James Holt. An Easy Rejection Sampling Baseline via Gradient Refined Proposals. IOS Press, 2023. URL, DOI BibTeX

@inbook{EasyRejectionSampling,
	title = "An Easy Rejection Sampling Baseline via Gradient Refined Proposals",
	isbn = 9781643684376,
	issn = "1879-8314",
	url = "http://dx.doi.org/10.3233/FAIA230483",
	doi = "10.3233/faia230483",
	booktitle = "ECAI 2023",
	publisher = "IOS Press",
	author = "Raff, Edward and McLean, Mark and Holt, James",
	year = 2023,
	month = ""
}

Robert J Joyce, Edward Raff, Charles Nicholas and James Holt. MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers. Proceedings of the Conference on Applied Machine Learning in Information Security, 2023. URL BibTeX

@article{MALDICT,
	title = "MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers",
	author = "Robert J Joyce and Edward Raff and Charles Nicholas and James Holt",
	journal = "Proceedings of the Conference on Applied Machine Learning in Information Security",
	year = 2023,
	url = "https://arxiv.org/abs/2310.11706",
	code = "https://github.com/joyce8/MalDICT"
}

Edward Raff and James Holt. Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests. 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023. URL BibTeX

@article{MIL,
	title = "Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests",
	author = "Edward Raff and James Holt",
	journal = "37th Conference on Neural Information Processing Systems (NeurIPS 2023)",
	year = 2023,
	url = "https://arxiv.org/abs/2310.17867"
}

Mohammad Mahmudul Alam, Edward Raff, Tim Oates and Cynthia Matuszek. DDxT: Deep Generative Transformer Models for Differential Diagnosis. Deep Generative Models for Health Workshop NeurIPS 2023, 2023. BibTeX

@article{DDXT,
	title = "DDxT: Deep Generative Transformer Models for Differential Diagnosis",
	author = "Mohammad Mahmudul Alam and Edward Raff and Tim Oates and Cynthia Matuszek",
	journal = "Deep Generative Models for Health Workshop NeurIPS 2023",
	year = 2023
}

Stella Biderman, USVSN Sai Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit and Edward Raff. Emergent and predictable memorization in large language models. 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023. BibTeX

@article{memorizationLLM,
	title = "Emergent and predictable memorization in large language models",
	author = "Stella Biderman and USVSN Sai Prashanth and Lintang Sutawika and Hailey Schoelkopf and Quentin Anthony and Shivanshu Purohit and Edward Raff",
	journal = "37th Conference on Neural Information Processing Systems (NeurIPS 2023)",
	year = 2023
}

Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff and Stella Biderman. LEACE: Perfect linear concept erasure in closed form. 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023. URL, DOI BibTeX

@article{LEACE,
	title = "LEACE: Perfect linear concept erasure in closed form",
	author = "Nora Belrose and David Schneider-Joseph and Shauli Ravfogel and Ryan Cotterell and Edward Raff and Stella Biderman",
	journal = "37th Conference on Neural Information Processing Systems (NeurIPS 2023)",
	year = 2023,
	doi = "10.48550/arXiv.2306.03819",
	url = "https://arxiv.org/abs/2306.03819"
}

Edward Raff, Amol Ashish Khanna and Fred Lu. Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations. In Thirty-seventh Conference on Neural Information Processing Systems. 2023. URL BibTeX

@inproceedings{ raff2023scaling,
	title = "Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations",
	author = "Edward Raff and Amol Ashish Khanna and Fred Lu",
	booktitle = "Thirty-seventh Conference on Neural Information Processing Systems",
	year = 2023,
	url = "https://openreview.net/forum?id=SuvDnzrKCo"
}

Luke E Richards, Edward Raff and Cynthia Matuszek. Measuring Equality in Machine Learning Security Defenses: A Case Study in Speech Recognition. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. 2023, 161–171. URL, DOI BibTeX

@inproceedings{10.1145/3605764.3623911,
	author = "Richards, Luke E. and Raff, Edward and Matuszek, Cynthia",
	title = "Measuring Equality in Machine Learning Security Defenses: A Case Study in Speech Recognition",
	year = 2023,
	isbn = 9798400702600,
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	url = "https://doi.org/10.1145/3605764.3623911",
	doi = "10.1145/3605764.3623911",
	abstract = "Over the past decade, the machine learning security community has developed a myriad of defenses for evasion attacks. An understudied question in that community is: for whom do these defenses defend? This work considers common approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations. We outline appropriate parity metrics for analysis and begin to answer this question through empirical results of the fairness implications of machine learning security methods. We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training. The framework we propose for measuring defense equality can be applied to robustly trained models, preprocessing-based defenses, and rejection methods. We identify a set of datasets with a user-centered application and a reasonable computational cost suitable for case studies in measuring the equality of defenses. In our case study of speech command recognition, we show how such adversarial training and augmentation have non-equal but complex protections for social subgroups across gender, accent, and age in relation to user coverage. We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups. This represents the first work examining the disparity in the adversarial robustness in the speech domain and the fairness evaluation of rejection-based defenses.",
	booktitle = "Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security",
	pages = "161–171",
	numpages = 11,
	keywords = "neural networks, fairness, machine learning security, adversarial machine learning, speech recognition",
	series = "AISec '23"
}

Amol Khanna, Fred Lu, Edward Raff and Brian Testa. Differentially Private Logistic Regression with Sparse Solutions. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. 2023, 1–9. URL, DOI BibTeX

@inproceedings{10.1145/3605764.3623910,
	author = "Khanna, Amol and Lu, Fred and Raff, Edward and Testa, Brian",
	title = "Differentially Private Logistic Regression with Sparse Solutions",
	year = 2023,
	isbn = 9798400702600,
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	url = "https://doi.org/10.1145/3605764.3623910",
	doi = "10.1145/3605764.3623910",
	abstract = "LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.",
	booktitle = "Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security",
	pages = "1–9",
	numpages = 9,
	keywords = "logistic regression, sparse, differential privacy, thresholding",
	series = "AISec '23"
}

Mohammad Mahmudul Alam, Edward Raff and Tim Oates. Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations. Neuro-Symbolic Learning and Reasoning in the era of Large Language Models, 2023. URL, DOI BibTeX

@article{SubitizingHRR,
	title = "Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations",
	author = "Mohammad Mahmudul Alam and Edward Raff and Tim Oates",
	journal = "Neuro-Symbolic Learning and Reasoning in the era of Large Language Models",
	year = 2023,
	doi = "10.48550/arXiv.2312.15310",
	url = "https://arxiv.org/abs/2312.15310"
}

Tirth Patel, Fred Lu, Edward Raff, Charles Nicholas, Cynthia Matuszek and James Holt. Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!. Proceedings of the Conference on Applied Machine Learning in Information Security, 2023. URL BibTeX

@article{TirthCAMLIS,
	title = "Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!",
	author = "Tirth Patel and Fred Lu and Edward Raff and Charles Nicholas and Cynthia Matuszek and James Holt",
	journal = "Proceedings of the Conference on Applied Machine Learning in Information Security",
	year = 2023,
	url = "https://arxiv.org/abs/2312.15813"
}

Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates and James Holt. Recasting self-attention with holographic reduced representations. In Proceedings of the 40th International Conference on Machine Learning. 2023. BibTeX

@inproceedings{10.5555/3618408.3618431,
	author = "Alam, Mohammad Mahmudul and Raff, Edward and Biderman, Stella and Oates, Tim and Holt, James",
	title = "Recasting self-attention with holographic reduced representations",
	year = 2023,
	publisher = "JMLR.org",
	abstract = {In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T2) memory and O(T2H) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T ≥ 100, 000 are a roadblock to deep learning, we re-cast self-attention using the neurosymbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a "Hrrformer" we obtain several benefits including O(TH log H) time complexity, O(TH) space complexity, and convergence in 10\texttimes{} fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280\texttimes{} faster to train on the Long Range Arena benchmark. Code is available at https://github.com/NeuromorphicComputationResearchProgram/Hrrformer},
	booktitle = "Proceedings of the 40th International Conference on Machine Learning",
	articleno = 23,
	numpages = 18,
	location = ", Honolulu, Hawaii, USA, ",
	series = "ICML'23"
}

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika and Oskar Van Der Wal. Pythia: a suite for analyzing large language models across training and scaling. In Proceedings of the 40th International Conference on Machine Learning. 2023. BibTeX

@inproceedings{10.5555/3618408.3618510,
	author = "Biderman, Stella and Schoelkopf, Hailey and Anthony, Quentin and Bradley, Herbie and O'Brien, Kyle and Hallahan, Eric and Khan, Mohammad Aflah and Purohit, Shivanshu and Prashanth, USVSN Sai and Raff, Edward and Skowron, Aviya and Sutawika, Lintang and Van Der Wal, Oskar",
	title = "Pythia: a suite for analyzing large language models across training and scaling",
	year = 2023,
	publisher = "JMLR.org",
	abstract = "How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend Pythia to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.",
	booktitle = "Proceedings of the 40th International Conference on Machine Learning",
	articleno = 102,
	numpages = 34,
	location = ", Honolulu, Hawaii, USA, ",
	series = "ICML'23"
}

Fred Lu, Edward Raff and James Holt. A coreset learning reality check. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. 2023. URL, DOI BibTeX

@inproceedings{10.1609/aaai.v37i7.26074,
	author = "Lu, Fred and Raff, Edward and Holt, James",
	title = "A coreset learning reality check",
	year = 2023,
	isbn = "978-1-57735-880-0",
	publisher = "AAAI Press",
	url = "https://doi.org/10.1609/aaai.v37i7.26074",
	doi = "10.1609/aaai.v37i7.26074",
	abstract = "Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets. In recent years, several works have proposed methods for subsampling rows from a data matrix while maintaining relevant information for classification. While these works are supported by theory and limited experiments, to date there has not been a comprehensive evaluation of these methods. In our work, we directly compare multiple methods for logistic regression drawn from the coreset and optimal subsampling literature and discover inconsistencies in their effectiveness. In many cases, methods do not outperform simple uniform subsampling.",
	booktitle = "Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence",
	articleno = 1005,
	numpages = 9,
	series = "AAAI'23/IAAI'23/EAAI'23"
}

Marcia DesJardin, Edward Raff, Angelina Stewart, Nicholas Baranco and Dimitrios Mastrogiannis. Comparison of two methods of antepartum anticoagulation: enoxaparin until scheduled labor versus transitioning to heparin. American Journal of Obstetrics and Gynecology 228(1):S531–S532, 2023. URL, DOI BibTeX

@article{DesJardin2023,
	title = "Comparison of two methods of antepartum anticoagulation: enoxaparin until scheduled labor versus transitioning to heparin",
	volume = 228,
	issn = "0002-9378",
	url = "http://dx.doi.org/10.1016/j.ajog.2022.11.908",
	doi = "10.1016/j.ajog.2022.11.908",
	number = 1,
	journal = "American Journal of Obstetrics and Gynecology",
	publisher = "Elsevier BV",
	author = "DesJardin, Marcia and Edward Raff and Stewart, Angelina and Baranco, Nicholas and Mastrogiannis, Dimitrios",
	year = 2023,
	month = "",
	pages = "S531–S532"
}

Robert J Joyce, Dev Amlani, Charles Nicholas and Edward Raff. MOTIF: A Malware Reference Dataset with Ground Truth Family Labels. Computers & Security 124:102921, 2023. URL, DOI BibTeX

@article{JOYCE2023102921,
	title = "MOTIF: A Malware Reference Dataset with Ground Truth Family Labels",
	journal = "Computers \& Security",
	volume = 124,
	pages = 102921,
	year = 2023,
	issn = "0167-4048",
	doi = "https://doi.org/10.1016/j.cose.2022.102921",
	url = "https://www.sciencedirect.com/science/article/pii/S0167404822003133",
	author = "Robert J. Joyce and Dev Amlani and Charles Nicholas and Edward Raff"
}

Fred Lu, Edward Raff and Francis Ferraro. Neural Bregman Divergences for Distance Learning. In The Eleventh International Conference on Learning Representations. 2023. URL BibTeX

@inproceedings{ lu2023neural,
	title = "Neural Bregman Divergences for Distance Learning",
	author = "Fred Lu and Edward Raff and Francis Ferraro",
	booktitle = "The Eleventh International Conference on Learning Representations",
	year = 2023,
	url = "https://openreview.net/forum?id=nJ3Vx78Nf7p"
}

Rebecca Saul, Mohammad Mahmudul Alam, John Hurwitz, Edward Raff, Tim Oates and James Holt. Lempel-Ziv Networks. In Proceedings on "I Can't Believe It's Not Better! - Understanding Deep Learning Through Empirical Falsification" at NeurIPS 2022 Workshops 187. 2023, 1–11. URL PDF BibTeX

@inproceedings{pmlr-v187-saul23a,
	title = "Lempel-Ziv Networks",
	author = "Saul, Rebecca and Alam, Mohammad Mahmudul and Hurwitz, John and Raff, Edward and Oates, Tim and Holt, James",
	booktitle = {Proceedings on "I Can't Believe It's Not Better! - Understanding Deep Learning Through Empirical Falsification" at NeurIPS 2022 Workshops},
	pages = "1--11",
	year = 2023,
	volume = 187,
	series = "Proceedings of Machine Learning Research",
	month = "03 Dec",
	publisher = "PMLR",
	pdf = "https://proceedings.mlr.press/v187/saul23a/saul23a.pdf",
	url = "https://proceedings.mlr.press/v187/saul23a.html",
	abstract = "Sequence processing has long been a central area of machine learning research. Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences. Compression-based methods have demonstrated more robustness when processing such sequences — in particular, an approach pairing the Lempel-Ziv Jaccard Distance (LZJD) with the k-Nearest Neighbor algorithm has shown promise on long sequence problems (up to steps) involving malware classification. Unfortunately, use of LZJD is limited to discrete domains. To extend the benefits of LZJD to a continuous domain, we investigate the effectiveness of a deep-learning analog of the algorithm, the Lempel-Ziv Network. While we achieve successful proof-of-concept, we are unable to meaningfully improve on the performance of a standard LSTM across a variety of datasets and sequence processing tasks. In addition to presenting this negative result, our work highlights the problem of sub-par baseline tuning in newer research areas."
}

Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, Saiful M Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff and Colin Raffel. Crosslingual Generalization through Multitask Finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 15991–16111. URL, DOI BibTeX

@inproceedings{muennighoff-etal-2023-crosslingual,
	title = "Crosslingual Generalization through Multitask Finetuning",
	author = "Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Le Scao, Teven and Bari, M Saiful and Shen, Sheng and Yong, Zheng Xin and Schoelkopf, Hailey and Tang, Xiangru and Radev, Dragomir and Aji, Alham Fikri and Almubarak, Khalid and Albanie, Samuel and Alyafeai, Zaid and Webson, Albert and Raff, Edward and Raffel, Colin",
	booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
	month = "",
	year = 2023,
	address = "Toronto, Canada",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2023.acl-long.891",
	doi = "10.18653/v1/2023.acl-long.891",
	pages = "15991--16111",
	abstract = "Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused on English data and models. We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0. We find finetuning large multilingual language models on English tasks with English prompts allows for task genrealization to non-English languages that appear only in the pretraining corpus. Finetuning on multilingual tasks with English prompts further improves performance on English and non-English tasks leading to various state-of-the-art zero-shot results. We also investigate finetuning on multilingual tasks with prompts that have been machine-translated from English to match the language of each dataset. We find training on these machine-translated prompts leads to better performance on human-written prompts in the respective languages. Surprisingly, we find models are capable of zero-shot generalization to tasks in languages they have never intentionally seen. We conjecture that the models are learning higher-level capabilities that are both task- and language-agnostic. In addition, we introduce xP3, a composite of supervised datasets in 46 languages with English and machine-translated prompts. Our code, datasets and models are freely available at \url{https://github.com/bigscience-workshop/xmtf}."
}

Mike Wong, Edward Raff, James Holt and Ravi Netravali. Marvolo: Programmatic Data Augmentation for Deep Malware Detection. In Machine Learning and Knowledge Discovery in Databases: Research Track: European Conference, ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Proceedings, Part I. 2023, 270–285. URL, DOI BibTeX

@inproceedings{10.1007/978-3-031-43412-9_16,
	author = "Wong, Mike and Raff, Edward and Holt, James and Netravali, Ravi",
	title = "Marvolo: Programmatic Data Augmentation for Deep Malware Detection",
	year = 2023,
	isbn = "978-3-031-43411-2",
	publisher = "Springer-Verlag",
	address = "Berlin, Heidelberg",
	url = "https://doi.org/10.1007/978-3-031-43412-9_16",
	doi = "10.1007/978-3-031-43412-9_16",
	abstract = "Data acquisition for ML-driven malware detection is challenging. While large commercial datasets exist, they are prohibitively expensive. On the other hand, an entity (e.g., a bank or government), may be targeted with unique malware, but the data samples available will never be sufficient to train a bespoke ML-based detector. While data augmentation has been a key component in improving deep learning models by providing requisite diversity for generalization, it has proven far more challenging for malware detection. The main challenges are that (1) determining the augmentations to make is not straightforward, (2) operations are on binaries rather than source code (which is not available), complicating correctness and understanding, and (3) labeling new files mandates expensive binary reverse engineering. We present Marvolo for creating realistic, semantics preserving transformations that mimic the code alterations made by malware authors in practice, allowing us to generate augmented data on raw binary files. This also enables Marvolo to safely propagate labels to newly-generated data. Across several malware datasets and recent ML-based detectors, Marvolo improves accuracy and AUC by up to 5\% and 10\% respectively, while boosting efficiency by 79x by avoiding redundant computation.",
	booktitle = "Machine Learning and Knowledge Discovery in Databases: Research Track: European Conference, ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Proceedings, Part I",
	pages = "270–285",
	numpages = 16,
	location = "Turin, Italy"
}

Marcia DesJardin, Edward Raff, Nicholas Baranco and Dimitrios Mastrogiannis. Cross-Sectional Survey of High-Risk Pregnant Women's Opinions on COVID-19 Vaccination. Women's Health Reports 3(1):608–616, June 2022. URL, DOI BibTeX

@article{DesJardin2022a,
	author = "DesJardin, Marcia and Raff, Edward and Baranco, Nicholas and Mastrogiannis, Dimitrios",
	doi = "10.1089/whr.2022.0006",
	issn = "2688-4844",
	journal = "Women's Health Reports",
	month = "jun",
	number = 1,
	pages = "608--616",
	title = "{Cross-Sectional Survey of High-Risk Pregnant Women's Opinions on COVID-19 Vaccination}",
	volume = 3,
	url = "https://www.liebertpub.com/doi/full/10.1089/whr.2022.0006",
	year = 2022,
	topic = "medical,covid"
}

Marcia DesJardin, Edward Raff, Nicholas Baranco and Dimitrios Mastrogiannis. Pregnant Women's Opinions on the COVID-19 Vaccination in Pregnancy [A301]. Obstetrics & Gynecology 139(1):87S–87S, May 2022. URL, DOI BibTeX

@article{DesJardin2022,
	author = "DesJardin, Marcia and Raff, Edward and Baranco, Nicholas and Mastrogiannis, Dimitrios",
	doi = "10.1097/01.AOG.0000825524.73715.9a",
	issn = "0029-7844",
	journal = "Obstetrics {\&} Gynecology",
	month = "may",
	number = 1,
	pages = "87S--87S",
	title = "{Pregnant Women's Opinions on the COVID-19 Vaccination in Pregnancy [A301]}",
	volume = 139,
	url = "https://journals.lww.com/greenjournal/Abstract/2022/05001/Pregnant_Women_s_Opinions_on_the_COVID_19.298.aspx",
	year = 2022,
	topic = "medical,covid"
}

Stella Biderman and Edward Raff. Fooling MOSS Detection with Pretrained Language Models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022, 2933–2943. URL, DOI BibTeX

@inproceedings{10.1145/3511808.3557079,
	author = "Biderman, Stella and Raff, Edward",
	title = "Fooling MOSS Detection with Pretrained Language Models",
	year = 2022,
	isbn = 9781450392365,
	publisher = "Association for Computing Machinery",
	address = "New York, NY, USA",
	url = "https://doi.org/10.1145/3511808.3557079",
	doi = "10.1145/3511808.3557079",
	abstract = "As artificial intelligence (AI) technologies become increasingly powerful and prominent in society, their misuse is a growing concern. In educational settings, AI technologies could be used by students to cheat on assignments and exams. In this paper we explore whether transformers can be used to solve introductory level programming assignments while bypassing commonly used AI tools to detect similarities between pieces of software. We find that a student using GPT-J [60] can complete introductory level programming assignments without triggering suspicion from MOSS [2], a widely used software similarity and plagiarism detection tool. This holds despite the fact that GPT-J was not trained on the problems in question and is not provided with any examples to work from. We further find that the code written by GPT-J is diverse in structure, lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code. We conclude with a discussion of the ethical and educational implications of large language models and directions for future research.",
	booktitle = "Proceedings of the 31st ACM International Conference on Information \& Knowledge Management",
	pages = "2933–2943",
	numpages = 11,
	keywords = "open source software, multimodal transformers, language models, education technology",
	location = "Atlanta, GA, USA",
	series = "CIKM '22"
}

Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott Zaresky-Williams, Edward Raff, Francis Ferraro and Brian Testa. A General Framework for Auditing Differentially Private Machine Learning. In Advances in Neural Information Processing Systems 35. 2022, 4165–4176. URL BibTeX

@inproceedings{NEURIPS2022_1add3bbd,
	author = "Lu, Fred and Munoz, Joseph and Fuchs, Maya and LeBlond, Tyler and Zaresky-Williams, Elliott and Raff, Edward and Ferraro, Francis and Testa, Brian",
	booktitle = "Advances in Neural Information Processing Systems",
	pages = "4165--4176",
	publisher = "Curran Associates, Inc.",
	title = "A General Framework for Auditing Differentially Private Machine Learning",
	url = "https://proceedings.neurips.cc/paper_files/paper/2022/file/1add3bbdbc20c403a383482a665eb5a4-Paper-Conference.pdf",
	volume = 35,
	year = 2022
}

Rebecca J Newbrander, Edward Raff, Katherine Frega and Mary J Cunningham. Eliminating postoperative opioid prescriptions is associated with lower long term opioid use (523). Gynecologic Oncology, 2022. URL, DOI BibTeX

@article{Newbrander2022EliminatingPO,
	title = "Eliminating postoperative opioid prescriptions is associated with lower long term opioid use (523)",
	author = "Rebecca J. Newbrander and Edward Raff and Katherine Frega and Mary J. Cunningham",
	journal = "Gynecologic Oncology",
	doi = "10.1016/S0090-8258(22)01744-9",
	url = "https://www.gynecologiconcology-online.net/article/S0090-8258(22)01744-9/abstract",
	year = 2022
}

Robert J Joyce, Dev Amlani, Charles Nicholas and Edward Raff. MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels. In The AAAI-22 Workshop on Artificial Intelligence for Cyber Security (AICS). 2022. URL, DOI BibTeX

@inproceedings{Joyce2022,
	archiveprefix = "arXiv",
	arxivid = "arXiv:2111.15031v1",
	author = "Joyce, Robert J and Amlani, Dev and Nicholas, Charles and Raff, Edward",
	booktitle = "The AAAI-22 Workshop on Artificial Intelligence for Cyber Security (AICS)",
	doi = "10.48550/arXiv.2111.15031",
	eprint = "arXiv:2111.15031v1",
	title = "{MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels}",
	url = "https://github.com/boozallen/MOTIF",
	year = 2022,
	topic = "malware,dataset"
}

Gaoussou Youssouf Kebe, Luke E Richards, Edward Raff, Francis Ferraro and Cynthia Matuszek. Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech. In AAAI. 2022. URL BibTeX

@inproceedings{Kebe2022,
	archiveprefix = "arXiv",
	arxivid = "2112.13758",
	author = "Kebe, Gaoussou Youssouf and Richards, Luke E. and Raff, Edward and Ferraro, Francis and Matuszek, Cynthia",
	booktitle = "AAAI",
	eprint = "2112.13758",
	title = "{Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech}",
	url = "http://arxiv.org/abs/2112.13758",
	year = 2022,
	topic = "grounding"
}

Andre T Nguyen, Fred Lu, Gary Lopez Munoz, Edward Raff, Charles Nicholas and James Holt. Out of Distribution Data Detection Using Dropout Bayesian Neural Networks. In Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022. URL BibTeX

@inproceedings{Nguyen2022,
	author = "Nguyen, Andre T and Lu, Fred and Munoz, Gary Lopez and Raff, Edward and Nicholas, Charles and Holt, James",
	booktitle = "Proceedings of the 36th AAAI Conference on Artificial Intelligence",
	title = "{Out of Distribution Data Detection Using Dropout Bayesian Neural Networks}",
	year = 2022,
	url = "https://arxiv.org/abs/2202.08985",
	topic = "bayesian,outlier"
}

Fred Lu, Francis Ferraro and Edward Raff. Continuously Generalized Ordinal Regression for Linear and Deep Models. In SIAM International Conference on Data Mining (SDM22). 2022. URL BibTeX

@inproceedings{Lu2022,
	archiveprefix = "arXiv",
	arxivid = "2202.07005",
	author = "Lu, Fred and Ferraro, Francis and Raff, Edward",
	booktitle = "SIAM International Conference on Data Mining (SDM22)",
	eprint = "2202.07005",
	keywords = "ordinal regression,ranking",
	title = "{Continuously Generalized Ordinal Regression for Linear and Deep Models}",
	url = "http://arxiv.org/abs/2202.07005",
	year = 2022,
	topic = "ordinal"
}

Corey J Nolet, Divye Gala, Edward Raff, Joe Eaton, Brad Rees, John Zedlewski and Tim Oates. Semiring Primitives for Sparse Neighborhood Methods on the GPU. In MLSys Conference. 2022.
Outstanding Paper Award (1 of 5). URL BibTeX

@inproceedings{Nolet2022,
	archiveprefix = "arXiv",
	arxivid = "2104.06357",
	author = "Nolet, Corey J. and Gala, Divye and Raff, Edward and Eaton, Joe and Rees, Brad and Zedlewski, John and Oates, Tim",
	booktitle = "MLSys Conference",
	eprint = "2104.06357",
	keywords = "GPU,distance metric,gpu,nearest neighbors,semiring",
	title = "{Semiring Primitives for Sparse Neighborhood Methods on the GPU}",
	url = "http://arxiv.org/abs/2104.06357",
	award = "Outstanding Paper (1 of 5)",
	note = "Outstanding Paper Award (1 of 5)",
	year = 2022,
	topic = "fast,knn"
}

Edward Raff. Does the Market of Citations Reward Reproducible Work?. In ML Evaluation Standards Workshop at ICLR 2022. 2022. URL, DOI BibTeX

@inproceedings{Raff2022,
	author = "Raff, Edward",
	booktitle = "ML Evaluation Standards Workshop at ICLR 2022",
	doi = "10.48550/arXiv.2204.03829",
	mendeley-groups = "Machine Learning/reproducability",
	title = "{Does the Market of Citations Reward Reproducible Work?}",
	url = "https://arxiv.org/abs/2204.03829",
	year = 2022,
	topic = "repro"
}

Edward Raff and Andrew L Farris. A Siren Song of Open Source Reproducibility. In ML Evaluation Standards Workshop at ICLR 2022. 2022.
Outstanding Paper Award (1 of 5). URL, DOI BibTeX

@inproceedings{Raff2022a,
	author = "Raff, Edward and Farris, Andrew L.",
	booktitle = "ML Evaluation Standards Workshop at ICLR 2022",
	doi = "10.48550/arXiv.2204.04372",
	mendeley-groups = "Machine Learning/reproducability",
	title = "{A Siren Song of Open Source Reproducibility}",
	award = "Outstanding Paper (1 of 5)",
	note = "Outstanding Paper Award (1 of 5)",
	url = "https://arxiv.org/abs/2204.04372",
	year = 2022,
	topic = "repro,opinion"
}

Mohammad Mahmudul Alam, Edward Raff, Tim Oates and James Holt. Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations. In International Conference on Machine Learning. 2022. URL BibTeX

@inproceedings{Alam2022,
	archiveprefix = "arXiv",
	arxivid = "2206.05893",
	author = "Alam, Mohammad Mahmudul and Raff, Edward and Oates, Tim and Holt, James",
	booktitle = "International Conference on Machine Learning",
	eprint = "2206.05893",
	title = "{Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations}",
	url = "http://arxiv.org/abs/2206.05893",
	code = "https://github.com/NeuromorphicComputationResearchProgram/Connectionist-Symbolic-Pseudo-Secrets",
	year = 2022,
	topic = "aml,hrr,vsa,fast"
}

Stella Biderman and Edward Raff. Neural Language Models are Effective Plagiarists. arXiv, 2022. URL, DOI BibTeX

@article{Biderman2022,
	archiveprefix = "arXiv",
	arxivid = "2201.07406",
	author = "Biderman, Stella and Raff, Edward",
	doi = "10.48550/arXiv.2201.07406",
	eprint = "2201.07406",
	journal = "arXiv",
	title = "{Neural Language Models are Effective Plagiarists}",
	url = "http://arxiv.org/abs/2201.07406",
	year = 2022,
	topic = "aml,deep"
}

Katherine Crowson, Stella Biderman, Daniel Kornis, Dashiell Stander, Eric Hallahan, Louis Castricato and Edward Raff. VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance. In ECCV. 2022. URL, DOI BibTeX

@inproceedings{Crowson2022,
	archiveprefix = "arXiv",
	arxivid = "2204.08583",
	author = "Crowson, Katherine and Biderman, Stella and Kornis, Daniel and Stander, Dashiell and Hallahan, Eric and Castricato, Louis and Raff, Edward",
	booktitle = "ECCV",
	doi = "10.48550/arXiv.2204.08583",
	eprint = "2204.08583",
	keywords = "generative adversarial networks,grounded language,image",
	title = "{VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance}",
	url = "http://arxiv.org/abs/2204.08583",
	year = 2022,
	topic = "deep,cv,generative"
}

A T Nguyen, L E Richards, G Kebe, Edward Raff, K Darvish, F Ferraro and C Matuszek. Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . June 2021, 1613-1622. URL PDF, DOI BibTeX

@inproceedings{Nguyen2020,
	author = "A. T. Nguyen and L. E. Richards and G. Kebe and Edward Raff and K. Darvish and F. Ferraro and C. Matuszek",
	booktitle = "2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",
	title = "Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning",
	year = 2021,
	volume = "",
	issn = "",
	pages = "1613-1622",
	abstract = "We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Procrustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.",
	keywords = "manifolds;measurement;learning systems;computer vision;conferences;natural languages;robot sensing systems",
	doi = "10.1109/CVPRW53098.2021.00177",
	url = "https://doi.ieeecomputersociety.org/10.1109/CVPRW53098.2021.00177",
	pdf = "https://arxiv.org/pdf/2009.05147",
	publisher = "IEEE Computer Society",
	address = "Los Alamitos, CA, USA",
	month = "jun",
	topic = "grounding"
}

Robert J Joyce, Edward Raff and Charles Nicholas. Rank-1 Similarity Matrix Decomposition For Modeling Changes in Antivirus Consensus Through Time. In Proceedings of the Conference on Applied Machine Learning for Information Security. 2021. URL BibTeX

@inproceedings{Joyce2021,
	archiveprefix = "arXiv",
	arxivid = "arXiv:2201.00757v1",
	author = "Joyce, Robert J and Raff, Edward and Nicholas, Charles",
	booktitle = "Proceedings of the Conference on Applied Machine Learning for Information Security",
	eprint = "arXiv:2201.00757v1",
	title = "{Rank-1 Similarity Matrix Decomposition For Modeling Changes in Antivirus Consensus Through Time}",
	year = 2021,
	url = "http://ceur-ws.org/Vol-3095/paper5.pdf",
	topic = "malware"
}

Robert J Joyce, Edward Raff and Charles Nicholas. A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels. In Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec '21). 2021. URL, DOI BibTeX

@inproceedings{agtr,
	archiveprefix = "arXiv",
	arxivid = "arXiv:2109.11126v1",
	author = "Joyce, Robert J and Raff, Edward and Nicholas, Charles",
	booktitle = "Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec '21)",
	doi = "10.1145/3474369.3486867",
	eprint = "arXiv:2109.11126v1",
	publisher = "Association for Computing Machinery",
	title = "{A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels}",
	year = 2021,
	url = "https://arxiv.org/abs/2109.11126",
	topic = "malware"
}

Corey J Nolet, Victor Lafargue, Edward Raff, Thejaswi Nanditale, Tim Oates, John Zedlewski and Joshua Patterson. Bringing UMAP Closer to the Speed of Light with GPU Acceleration. In The Thirty-Fifth AAAI Conference on Artificial Intelligence. 2021. URL BibTeX

@inproceedings{Nolet2020,
	archiveprefix = "arXiv",
	arxivid = "2008.00325",
	author = "Nolet, Corey J. and Lafargue, Victor and Raff, Edward and Nanditale, Thejaswi and Oates, Tim and Zedlewski, John and Patterson, Joshua",
	booktitle = "The Thirty-Fifth AAAI Conference on Artificial Intelligence",
	eprint = "2008.00325",
	title = "{Bringing UMAP Closer to the Speed of Light with GPU Acceleration}",
	url = "http://arxiv.org/abs/2008.00325",
	year = 2021,
	topic = "fast"
}

Edward Raff. Research Reproducibility as a Survival Analysis. In The Thirty-Fifth AAAI Conference on Artificial Intelligence. 2021. URL BibTeX

@inproceedings{Raff2020c,
	archiveprefix = "arXiv",
	arxivid = "2012.09932",
	author = "Raff, Edward",
	booktitle = "The Thirty-Fifth AAAI Conference on Artificial Intelligence",
	eprint = "2012.09932",
	mendeley-groups = "Machine Learning/reproducability",
	title = "{Research Reproducibility as a Survival Analysis}",
	url = "http://arxiv.org/abs/2012.09932",
	year = 2021,
	code = "https://github.com/EdwardRaff/Research-Reproducibility-Survival-Analysis",
	topic = "repro"
}

James Holt and Edward Raff. RaNdOm Is RoBuSt: Using Randomness to Make Classifiers Resistant to Attack. The Next Wave 23(1):60–59, 2021. URL BibTeX

@article{Holt2021,
	author = "Holt, James and Raff, Edward",
	journal = "The Next Wave",
	number = 1,
	pages = "60--59",
	title = "{RaNdOm Is RoBuSt: Using Randomness to Make Classifiers Resistant to Attack}",
	url = "https://www.nsa.gov/Portals/70/documents/resources/everyone/digital-media-center/publications/the-next-wave/TNW{\_}23-1.pdf?ver=FoJ2Lq82tlu2VAih1z2VoA{\%}3D{\%}3D",
	volume = 23,
	year = 2021,
	topic = "aml"
}

Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux and Pascal Vincent. Accounting for Variance in Machine Learning Benchmarks. In Machine Learning and Systems (MLSys). 2021. URL BibTeX

@inproceedings{Bouthillier2021,
	archiveprefix = "arXiv",
	arxivid = "2103.03098",
	author = {Bouthillier, Xavier and Delaunay, Pierre and Bronzi, Mirko and Trofimov, Assya and Nichyporuk, Brennan and Szeto, Justin and Sepah, Naz and Raff, Edward and Madan, Kanika and Voleti, Vikram and Kahou, Samira Ebrahimi and Michalski, Vincent and Serdyuk, Dmitriy and Arbel, Tal and Pal, Chris and Varoquaux, Ga{\"{e}}l and Vincent, Pascal},
	booktitle = "Machine Learning and Systems (MLSys)",
	eprint = "2103.03098",
	mendeley-groups = "Machine Learning/Evaluation,Machine Learning",
	title = "{Accounting for Variance in Machine Learning Benchmarks}",
	url = "http://arxiv.org/abs/2103.03098",
	year = 2021,
	topic = "repro"
}

Edward Raff. Exact Acceleration of K-Means ++ and K-Means. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 2021, 2928–2935. URL, DOI BibTeX

@inproceedings{Raff2021,
	author = "Raff, Edward",
	booktitle = "Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21",
	doi = "10.24963/ijcai.2021/403",
	pages = "2928--2935",
	title = "{Exact Acceleration of K-Means ++ and K-Means}",
	url = "https://arxiv.org/abs/2105.02936",
	year = 2021,
	code = "https://github.com/EdwardRaff/JSAT/blob/0693e5b516d80ae7efcb5427cec5dbab3262144a/JSAT/src/jsat/clustering/SeedSelectionMethods.java#L54",
	topic = "fast"
}

Catherine Ordun, Edward Raff and Sanjay Purushotham. Generating Thermal Human Faces for Physiological Assessment Using Thermal Sensor Auxiliary Labels. In ICIP. 2021. URL BibTeX

@inproceedings{Ordun2021,
	archiveprefix = "arXiv",
	arxivid = "2106.08091",
	author = "Ordun, Catherine and Raff, Edward and Purushotham, Sanjay",
	booktitle = "ICIP",
	eprint = "2106.08091",
	title = "{Generating Thermal Human Faces for Physiological Assessment Using Thermal Sensor Auxiliary Labels}",
	url = "http://arxiv.org/abs/2106.08091",
	year = 2021,
	topic = "biometrics"
}

Andre T Nguyen, Edward Raff, Charles Nicholas and James Holt. Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints. In IJCAI-21 1st International Workshop on Adaptive Cyber Defense. 2021. URL BibTeX

@inproceedings{Nguyen2021,
	archiveprefix = "arXiv",
	arxivid = "2108.04081",
	author = "Nguyen, Andre T. and Raff, Edward and Nicholas, Charles and Holt, James",
	booktitle = "IJCAI-21 1st International Workshop on Adaptive Cyber Defense",
	eprint = "2108.04081",
	title = "{Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints}",
	url = "http://arxiv.org/abs/2108.04081",
	year = 2021,
	topic = "malware,bayesian"
}

Edward Raff, William Fleshman, Richard Zak, Hyrum S Anderson, Bobby Filar and Mark McLean. Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection. In The Thirty-Fifth AAAI Conference on Artificial Intelligence. 2021. URL BibTeX

@inproceedings{Raff2020b,
	archiveprefix = "arXiv",
	arxivid = "2012.09390",
	author = "Raff, Edward and Fleshman, William and Zak, Richard and Anderson, Hyrum S. and Filar, Bobby and McLean, Mark",
	booktitle = "The Thirty-Fifth AAAI Conference on Artificial Intelligence",
	eprint = "2012.09390",
	issn = 23318422,
	title = "{Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection}",
	url = "http://arxiv.org/abs/2012.09390",
	code = "https://github.com/NeuromorphicComputationResearchProgram/MalConv2",
	year = 2021,
	topic = "malware,deep,fast"
}

Luke E Richards, André Nguyen, Ryan Capps, Steven Forsythe, Cynthia Matuszek and Edward Raff. Adversarial Transfer Attacks With Unknown Data and Class Overlap. In Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec '21). 2021. URL, DOI BibTeX

@inproceedings{Richards2021,
	archiveprefix = "arXiv",
	arxivid = "2109.11125",
	author = "Richards, Luke E. and Nguyen, Andr{\'{e}} and Capps, Ryan and Forsythe, Steven and Matuszek, Cynthia and Raff, Edward",
	booktitle = "Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec '21)",
	doi = "10.1145/3474369.3486862",
	eprint = "2109.11125",
	publisher = "Association for Computing Machinery",
	title = "{Adversarial Transfer Attacks With Unknown Data and Class Overlap}",
	url = "http://arxiv.org/abs/2109.11125",
	year = 2021,
	topic = "aml"
}

Gaoussou Youssouf Kebe, Padraig Higgins, Patrick Jenkins, Kasra Darvish, Ryan Barron, John Winder, Don Engel, Edward Raff, Francis Ferraro, Cynthia Matuszek and Booz Allen Hamilton. A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning. In NeurIPS. 2021. URL BibTeX

@inproceedings{Kebe2021a,
	author = "Kebe, Gaoussou Youssouf and Higgins, Padraig and Jenkins, Patrick and Darvish, Kasra and Barron, Ryan and Winder, John and Engel, Don and Raff, Edward and Ferraro, Francis and Matuszek, Cynthia and Hamilton, Booz Allen",
	booktitle = "NeurIPS",
	title = "{A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning}",
	url = "https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/3416a75f4cea9109507cacd8e2f2aefc-Abstract-round1.html",
	year = 2021,
	code = "https://github.com/iral-lab/gold",
	topic = "dataset,grounding"
}

Ashwinkumar Ganesan, Hang Gao, Sunil Gandhi, Edward Raff, Tim Oates, James Holt and Mark McLean. Learning with Holographic Reduced Representations. In Advances in Neural Information Processing Systems. 2021. URL BibTeX

@inproceedings{Ganesan2021,
	archiveprefix = "arXiv",
	arxivid = "2109.02157",
	author = "Ganesan, Ashwinkumar and Gao, Hang and Gandhi, Sunil and Raff, Edward and Oates, Tim and Holt, James and McLean, Mark",
	booktitle = "Advances in Neural Information Processing Systems",
	eprint = "2109.02157",
	title = "{Learning with Holographic Reduced Representations}",
	url = "http://arxiv.org/abs/2109.02157",
	code = "https://github.com/NeuromorphicComputationResearchProgram/Learning-with-Holographic-Reduced-Representations",
	year = 2021,
	topic = "vsa,hrr"
}

Catherine Ordun, Alexandra N Cha, Edward Raff, Byron Gaskin, Alex Hanson, Mason Rule, Sanjay Purushotham and James L Gulley. Intelligent Sight and Sound : A Chronic Cancer Pain Dataset. In NeurIPS. 2021. URL BibTeX

@inproceedings{Ordun2021a,
	author = "Ordun, Catherine and Cha, Alexandra N and Raff, Edward and Gaskin, Byron and Hanson, Alex and Rule, Mason and Purushotham, Sanjay and Gulley, James L",
	booktitle = "NeurIPS",
	title = "{Intelligent Sight and Sound : A Chronic Cancer Pain Dataset}",
	year = 2021,
	url = "https://arxiv.org/abs/2204.04214",
	topic = "health,dataset"
}

John Boutsikas, Maksim E Eren, Charles Varga, Edward Raff, Cynthia Matuszek and Charles Nicholas. Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery. In Malware Technical Exchange Meeting. 2021. URL BibTeX

@inproceedings{Boutsikas2021,
	archiveprefix = "arXiv",
	arxivid = "arXiv:2106.07860v1",
	author = "Boutsikas, John and Eren, Maksim E and Varga, Charles and Raff, Edward and Matuszek, Cynthia and Nicholas, Charles",
	booktitle = "Malware Technical Exchange Meeting",
	eprint = "arXiv:2106.07860v1",
	title = "{Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery}",
	url = "https://arxiv.org/pdf/2106.07860.pdf",
	year = 2021,
	topic = "malware,aml"
}

Edward Raff, Bobby Filar and James Holt. Getting Passive Aggressive About False Positives: Patching Deployed Malware Detectors. In 2020 International Conference on Data Mining Workshops (ICDMW). November 2020, 506–515. URL, DOI BibTeX

@inproceedings{Raff2020d,
	author = "Raff, Edward and Filar, Bobby and Holt, James",
	booktitle = "2020 International Conference on Data Mining Workshops (ICDMW)",
	doi = "10.1109/ICDMW51313.2020.00074",
	isbn = "978-1-7281-9012-9",
	month = "nov",
	pages = "506--515",
	publisher = "IEEE",
	title = "{Getting Passive Aggressive About False Positives: Patching Deployed Malware Detectors}",
	url = "https://ieeexplore.ieee.org/document/9346444/",
	year = 2020,
	topic = "malware"
}

Wenbin Zhang, Mingli Zhang, Ji Zhang, Zhen Liu, Zhiyuan Chen, Jianwu Wang, Edward Raff and Enza Messina. Flexible and Adaptive Fairness-aware Learning in Non-stationary Data Streams. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). November 2020, 399–406. URL, DOI BibTeX

@inproceedings{Zhang2020a,
	author = "Zhang, Wenbin and Zhang, Mingli and Zhang, Ji and Liu, Zhen and Chen, Zhiyuan and Wang, Jianwu and Raff, Edward and Messina, Enza",
	booktitle = "2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)",
	doi = "10.1109/ICTAI50040.2020.00069",
	isbn = "978-1-7281-9228-4",
	keywords = "ai fairness,flexible fairness,online classification",
	mendeley-groups = "Machine Learning/fairness/classifiers/trees",
	month = "nov",
	pages = "399--406",
	publisher = "IEEE",
	title = "{Flexible and Adaptive Fairness-aware Learning in Non-stationary Data Streams}",
	url = "https://ieeexplore.ieee.org/document/9288346/",
	year = 2020,
	topic = "fairness"
}

Edward Raff, Charles Nicholas and Mark McLean. A New Burrows Wheeler Transform Markov Distance. In The Thirty-Fourth AAAI Conference on Artificial Intelligence. 2020, 5444–5453. URL, DOI BibTeX

@inproceedings{Raff2020,
	archiveprefix = "arXiv",
	arxivid = "1912.13046",
	author = "Raff, Edward and Nicholas, Charles and McLean, Mark",
	booktitle = "The Thirty-Fourth AAAI Conference on Artificial Intelligence",
	doi = "10.1609/aaai.v34i04.5994",
	eprint = "1912.13046",
	pages = "5444--5453",
	title = "{A New Burrows Wheeler Transform Markov Distance}",
	url = "http://arxiv.org/abs/1912.13046",
	code = "https://github.com/EdwardRaff/pyBWMD",
	year = 2020,
	topic = "malware,digest,fast"
}

Arash Rahnama, Andre T Nguyen and Edward Raff. Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, 8178–8187. URL BibTeX

@inproceedings{Rahnama2020,
	archiveprefix = "arXiv",
	arxivid = "1911.04636",
	author = "Rahnama, Arash and Nguyen, Andre T. and Raff, Edward",
	booktitle = "The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",
	eprint = "1911.04636",
	pages = "8178--8187",
	title = "{Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory}",
	url = "http://arxiv.org/abs/1911.04636",
	year = 2020,
	topic = "aml"
}

Catherine Ordun, Sanjay Purushotham and Edward Raff. Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs. In epiDAMIK 2020: 3rd epiDAMIK ACM SIGKDD International Workshop on Epidemiology meets Data Mining and Knowledge Discovery. 2020. URL BibTeX

@inproceedings{Ordun2020,
	archiveprefix = "arXiv",
	arxivid = "2005.03082",
	author = "Ordun, Catherine and Purushotham, Sanjay and Raff, Edward",
	booktitle = "epiDAMIK 2020: 3rd epiDAMIK ACM SIGKDD International Workshop on Epidemiology meets Data Mining and Knowledge Discovery",
	eprint = "2005.03082",
	title = "{Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs}",
	url = "http://arxiv.org/abs/2005.03082",
	year = 2020,
	topic = "epi,health,covid"
}

Edward Raff and Charles Nicholas. A Survey of Machine Learning Methods and Challenges for Windows Malware Classification. In NeurIPS 2020 Workshop: ML Retrospectives, Surveys & Meta-Analyses (ML-RSA). 2020.
Best Paper Award. URL BibTeX

@inproceedings{Raff2020a,
	archiveprefix = "arXiv",
	arxivid = "2006.09271",
	author = "Raff, Edward and Nicholas, Charles",
	booktitle = "NeurIPS 2020 Workshop: ML Retrospectives, Surveys {\&} Meta-Analyses (ML-RSA)",
	eprint = "2006.09271",
	keywords = "68t01,68t99,ams subject classifications,cyber security,machine learning,malware detection",
	title = "{A Survey of Machine Learning Methods and Challenges for Windows Malware Classification}",
	award = "Best Paper",
	note = "Best Paper Award",
	url = "http://arxiv.org/abs/2006.09271",
	year = 2020,
	topic = "malware,best"
}

Patrick Jenkins, Rishabh Sachdeva, Gaoussou Youssouf Kebe, Padraig Higgins, Kasra Darvish, Edward Raff, Don Engel, John Winder, Francisco Ferraro and Cynthia Matuszek. Presentation and Analysis of a Multimodal Dataset for Grounded LanguageLearning. arXiv, 2020. URL BibTeX

@article{Jenkins2020,
	archiveprefix = "arXiv",
	arxivid = "2007.14987",
	author = "Jenkins, Patrick and Sachdeva, Rishabh and Kebe, Gaoussou Youssouf and Higgins, Padraig and Darvish, Kasra and Raff, Edward and Engel, Don and Winder, John and Ferraro, Francisco and Matuszek, Cynthia",
	eprint = "2007.14987",
	journal = "arXiv",
	title = "{Presentation and Analysis of a Multimodal Dataset for Grounded LanguageLearning}",
	url = "http://arxiv.org/abs/2007.14987",
	year = 2020,
	topic = "dataset,grounding"
}

Maksim Ekin Eren, Nick Solovyev, Edward Raff, Charles Nicholas and Ben Johnson. COVID-19 Kaggle Literature Organization. In Proceedings of the ACM Symposium on Document Engineering 2020. 2020, 1–4. URL, DOI BibTeX

@inproceedings{Eren2020,
	author = "Eren, Maksim Ekin and Solovyev, Nick and Raff, Edward and Nicholas, Charles and Johnson, Ben",
	booktitle = "Proceedings of the ACM Symposium on Document Engineering 2020",
	doi = "10.1145/3395027.3419591",
	keywords = "acm reference format,charles nicholas,clustering,covid-19,dimensionality reduction,document visualization,edward raff,maksim ekin eren,nick solovyev",
	pages = "1--4",
	title = "{COVID-19 Kaggle Literature Organization}",
	url = "https://dl.acm.org/doi/10.1145/3395027.3419591",
	year = 2020,
	code = "https://www.kaggle.com/maksimeren/covid-19-literature-clustering",
	topic = "covid"
}

Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S Anderson, Bobby Filar, Charles Nicholas and James Holt. Automatic Yara Rule Generation Using Biclustering. In 13th ACM Workshop on Artificial Intelligence and Security (AISec'20). 2020.
Best Paper Award. URL, DOI BibTeX

@inproceedings{Raff2020autoyara,
	archiveprefix = "arXiv",
	arxivid = "2009.03779",
	author = "Raff, Edward and Zak, Richard and Munoz, Gary Lopez and Fleming, William and Anderson, Hyrum S. and Filar, Bobby and Nicholas, Charles and Holt, James",
	booktitle = "13th ACM Workshop on Artificial Intelligence and Security (AISec'20)",
	doi = "10.1145/3411508.3421372",
	eprint = "2009.03779",
	title = "{Automatic Yara Rule Generation Using Biclustering}",
	award = "Best Paper",
	note = "Best Paper Award",
	url = "http://arxiv.org/abs/2009.03779",
	code = "https://github.com/NeuromorphicComputationResearchProgram/AutoYara",
	year = 2020,
	topic = "malware"
}

Catherine Ordun, Edward Raff and Sanjay Purushotham. The Use of AI for Thermal Emotion Recognition: A Review of Problems and Limitations in Standard Design and Data. In AAAI FSS-20: Artificial Intelligence in Government and Public Sector. 2020. URL BibTeX

@inproceedings{Ordun2020a,
	archiveprefix = "arXiv",
	arxivid = "2009.10589",
	author = "Ordun, Catherine and Raff, Edward and Purushotham, Sanjay",
	booktitle = "AAAI FSS-20: Artificial Intelligence in Government and Public Sector",
	eprint = "2009.10589",
	title = "{The Use of AI for Thermal Emotion Recognition: A Review of Problems and Limitations in Standard Design and Data}",
	url = "http://arxiv.org/abs/2009.10589",
	year = 2020,
	topic = "biometrics"
}

Nisha Pillai, Edward Raff, Francis Ferraro and Cynthia Matuszek. Sampling Approach Matters: Active Learning for Robotic Language Acquisition. In 2020 IEEE International Conference on Big Data (Big Data). 2020. URL BibTeX

@inproceedings{Pillai2020,
	archiveprefix = "arXiv",
	arxivid = "2011.08021",
	author = "Pillai, Nisha and Raff, Edward and Ferraro, Francis and Matuszek, Cynthia",
	booktitle = "2020 IEEE International Conference on Big Data (Big Data)",
	eprint = "2011.08021",
	title = "{Sampling Approach Matters: Active Learning for Robotic Language Acquisition}",
	url = "http://arxiv.org/abs/2011.08021",
	year = 2020,
	topic = "grounding"
}

Jared Sylvester and Edward Raff. Trimming the Thorns of AI Fairness Research. Data Engineering 43(4):74–84, 2020. URL BibTeX

@article{Sylvester2020,
	author = "Sylvester, Jared and Raff, Edward",
	journal = "Data Engineering",
	number = 4,
	pages = "74--84",
	title = "{Trimming the Thorns of AI Fairness Research}",
	volume = 43,
	year = 2020,
	url = "http://sites.computer.org/debull/A20dec/p74.pdf",
	topic = "opinion"
}

Andre T Nguyen, Edward Raff and Aaron Sant-Miller. Would a File by Any Other Name Seem as Malicious?. In 2019 IEEE International Conference on Big Data (Big Data). December 2019, 1322–1331. URL, DOI BibTeX

@inproceedings{Nguyen2019_filename_malicious,
	author = "Nguyen, Andre T and Raff, Edward and Sant-Miller, Aaron",
	booktitle = "2019 IEEE International Conference on Big Data (Big Data)",
	doi = "10.1109/BigData47090.2019.9006132",
	isbn = "978-1-7281-0858-2",
	month = "dec",
	pages = "1322--1331",
	publisher = "IEEE",
	title = "{Would a File by Any Other Name Seem as Malicious?}",
	url = "https://ieeexplore.ieee.org/document/9006132/",
	year = 2019,
	topic = "malware"
}

Ashley Klein, Julio J Jauregui, Edward Raff, Frank R Henn, Ashfaq S Hasan and Mohit Gilotra. Early outcomes and complications of obese patients undergoing shoulder arthroplasty: A meta-analysis. Journal of Clinical Orthopaedics and Trauma, September 2019. URL, DOI BibTeX

@article{Klein2019,
	author = "Klein, Ashley and Jauregui, Julio J. and Raff, Edward and Henn, R. Frank and Hasan, S. Ashfaq and Gilotra, Mohit",
	doi = "10.1016/J.JCOT.2019.09.002",
	issn = "0976-5662",
	journal = "Journal of Clinical Orthopaedics and Trauma",
	mendeley-groups = "Me",
	month = "sep",
	publisher = "Elsevier",
	title = "{Early outcomes and complications of obese patients undergoing shoulder arthroplasty: A meta-analysis}",
	url = "https://www.sciencedirect.com/science/article/pii/S0976566219303686",
	year = 2019,
	topic = "health,medical"
}

William Fleshman, Edward Raff, Jared Sylvester, Steven Forsyth and Mark McLean. Non-Negative Networks Against Adversarial Attacks. AAAI-2019 Workshop on Artificial Intelligence for Cyber Security, 2019. URL BibTeX

@article{Fleshman2018a,
	archiveprefix = "arXiv",
	arxivid = "1806.06108",
	author = "Fleshman, William and Raff, Edward and Sylvester, Jared and Forsyth, Steven and McLean, Mark",
	eprint = "1806.06108",
	journal = "AAAI-2019 Workshop on Artificial Intelligence for Cyber Security",
	title = "{Non-Negative Networks Against Adversarial Attacks}",
	url = "http://arxiv.org/abs/1806.06108",
	year = 2019,
	topic = "aml,malware"
}

Andre T Nguyen and Edward Raff. Adversarial Attacks, Regression, and Numerical Stability Regularization. In The AAAI-19 Workshop on Engineering Dependable and Secure Machine Learning Systems. 2019. URL BibTeX

@inproceedings{Nguyen2019_ANSR,
	author = "Nguyen, Andre T and Raff, Edward",
	booktitle = "The AAAI-19 Workshop on Engineering Dependable and Secure Machine Learning Systems",
	title = "{Adversarial Attacks, Regression, and Numerical Stability Regularization}",
	url = "https://arxiv.org/pdf/1812.02885.pdf",
	year = 2019,
	topic = "aml"
}

Edward Raff, Shannon Lantzy and Ezekiel J Maier. Dr. AI, Where Did You Get Your Degree?. In Artificial Intelligence in Health. 2019, 76–83. URL BibTeX

@inproceedings{dr_ai_long,
	author = "Raff, Edward and Lantzy, Shannon and Maier, Ezekiel J",
	booktitle = "Artificial Intelligence in Health",
	isbn = "978-3-030-12738-1",
	pages = "76--83",
	publisher = "Springer International Publishing",
	title = "{Dr. AI, Where Did You Get Your Degree?}",
	url = "https://www.edwardraff.com/publications/dr-ai-degree-long.pdf",
	year = 2019,
	topic = "opinion,health"
}

E Raff and J Sylvester. Gradient reversal against discrimination: A fair neural network learning approach. In Proceedings - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 2018. 2019. URL, DOI BibTeX

@inproceedings{Raff2019b,
	author = "Raff, E. and Sylvester, J.",
	booktitle = "Proceedings - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 2018",
	doi = "10.1109/DSAA.2018.00029",
	isbn = 9781538650905,
	keywords = "Ease of use,Fairness,Neural networks",
	title = "{Gradient reversal against discrimination: A fair neural network learning approach}",
	url = "https://www.edwardraff.com/publications/grad_dsaa.pdf",
	year = 2019,
	topic = "fairness"
}

Edward Raff, Joe Aurelio and Charles Nicholas. PyLZJD: An Easy to Use Tool for Machine Learning. In Proceedings of the 18th Python in Science Conference. 2019, 97–102. URL, DOI BibTeX

@inproceedings{pylzjd-proc-scipy-2019,
	author = "Raff, Edward and Aurelio, Joe and Nicholas, Charles",
	booktitle = "Proceedings of the 18th Python in Science Conference",
	doi = "10.25080/Majora-7ddc1dd1-00e",
	pages = "97--102",
	title = "{PyLZJD: An Easy to Use Tool for Machine Learning}",
	url = "http://conference.scipy.org/proceedings/scipy2019/pylzjd.html",
	year = 2019,
	topic = "fast,digest"
}

Arash Rahnama, Andre T Nguyen and Edward Raff. Connecting Lyapunov Control Theory to Adversarial Attacks. In Proceedings ofAdvML'19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD. 2019. URL BibTeX

@inproceedings{Rahnama2019,
	archiveprefix = "arXiv",
	arxivid = "1907.07732",
	author = "Rahnama, Arash and Nguyen, Andre T. and Raff, Edward",
	booktitle = "Proceedings ofAdvML'19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD",
	eprint = "1907.07732",
	title = "{Connecting Lyapunov Control Theory to Adversarial Attacks}",
	url = "http://arxiv.org/abs/1907.07732",
	year = 2019,
	topic = "aml"
}

Andre T Nguyen and Edward Raff. Heterogeneous Relational Kernel Learning. In 5th KDDWorkshop on Mining and Learning from Time Series. 2019. URL BibTeX

@inproceedings{Nguyen2019,
	archiveprefix = "arXiv",
	arxivid = "1908.09219",
	author = "Nguyen, Andre T. and Raff, Edward",
	booktitle = "5th KDDWorkshop on Mining and Learning from Time Series",
	eprint = "1908.09219",
	keywords = "clustering,gaussian processes,inter-,kernel learning,time series",
	mendeley-groups = "Machine Learning/Explanatory/autostat",
	title = "{Heterogeneous Relational Kernel Learning}",
	url = "http://arxiv.org/abs/1908.09219",
	year = 2019,
	topic = "health"
}

Andre T Nguyen, Julia Lien, Edward Raff and Sumiko R Mekaru. Improved Automatic Pharmacovigilance : An Enhancement to the MedWatcher Social System for Monitoring Adverse Events. In Epidemiology Meets Data Mining and Knowledge Discovery Workshop at KDD. 2019. URL, DOI BibTeX

@inproceedings{Nguyen2019_pharma,
	author = "Nguyen, Andre T and Lien, Julia and Raff, Edward and Mekaru, Sumiko R.",
	booktitle = "Epidemiology Meets Data Mining and Knowledge Discovery Workshop at KDD",
	doi = "10.1101/717421",
	title = "{Improved Automatic Pharmacovigilance : An Enhancement to the MedWatcher Social System for Monitoring Adverse Events}",
	url = "https://www.biorxiv.org/content/10.1101/717421v1",
	year = 2019,
	topic = "health|epi"
}

Edward Raff, Jared Sylvester, Steven Forsyth and Mark McLean. Barrage of Random Transforms for Adversarially Robust Defense. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 6528–6537. URL BibTeX

@inproceedings{Raff_BaRT_2019,
	address = "Long Beach, CA",
	author = "Raff, Edward and Sylvester, Jared and Forsyth, Steven and McLean, Mark",
	booktitle = "The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
	pages = "6528--6537",
	title = "{Barrage of Random Transforms for Adversarially Robust Defense}",
	url = "http://openaccess.thecvf.com/content{\_}CVPR{\_}2019/html/Raff{\_}Barrage{\_}of{\_}Random{\_}Transforms{\_}for{\_}Adversarially{\_}Robust{\_}Defense{\_}CVPR{\_}2019{\_}paper.html",
	year = 2019,
	topic = "aml"
}

Edward Raff. A Step Toward Quantifying Independently Reproducible Machine Learning Research. In NeurIPS. 2019. URL BibTeX

@inproceedings{Raff2019_quantify_repro,
	archiveprefix = "arXiv",
	arxivid = "1909.06674",
	author = "Raff, Edward",
	booktitle = "NeurIPS",
	eprint = "1909.06674",
	mendeley-groups = "Machine Learning/reproducability/attempts",
	title = "{A Step Toward Quantifying Independently Reproducible Machine Learning Research}",
	url = "http://arxiv.org/abs/1909.06674",
	year = 2019,
	topic = "repro"
}

Edward Raff, William Fleming, Richard Zak, Hyrum Anderson, Bill Finlayson, Charles K Nicholas, Mark Mclean, William Fleming, Charles K Nicholas, Richard Zak and Mark Mclean. KiloGrams: Very Large N-Grams for Malware Classification. In Proceedings of KDD 2019 Workshop on Learning and Mining for Cybersecurity (LEMINCS'19). 2019. URL BibTeX

@inproceedings{Kilograms_2019,
	author = "Raff, Edward and Fleming, William and Zak, Richard and Anderson, Hyrum and Finlayson, Bill and Nicholas, Charles K. and Mclean, Mark and Fleming, William and Nicholas, Charles K. and Zak, Richard and Mclean, Mark",
	booktitle = "Proceedings of KDD 2019 Workshop on Learning and Mining for Cybersecurity (LEMINCS'19)",
	title = "{KiloGrams: Very Large N-Grams for Malware Classification}",
	url = "https://arxiv.org/abs/1908.00200",
	year = 2019,
	topic = "fast,ngram,malware"
}

Edward Raff and Mark McLean. Hash-Grams On Many-Cores and Skewed Distributions. In 2018 IEEE International Conference on Big Data (Big Data). December 2018, 158–165. URL, DOI BibTeX

@inproceedings{raff_hash_gram_parallel,
	author = "Raff, Edward and McLean, Mark",
	booktitle = "2018 IEEE International Conference on Big Data (Big Data)",
	doi = "10.1109/BigData.2018.8622043",
	isbn = "978-1-5386-5035-6",
	month = "dec",
	pages = "158--165",
	publisher = "IEEE",
	title = "{Hash-Grams On Many-Cores and Skewed Distributions}",
	url = "https://ieeexplore.ieee.org/document/8622043/",
	year = 2018,
	topic = "fast,ngram,malware"
}

Edward Raff and Jared Sylvester. Linear Models with Many Cores and CPUs: A Stochastic Atomic Update Scheme. In 2018 IEEE International Conference on Big Data (Big Data). December 2018, 65–73. URL, DOI BibTeX

@inproceedings{raff_saus,
	author = "Raff, Edward and Sylvester, Jared",
	booktitle = "2018 IEEE International Conference on Big Data (Big Data)",
	doi = "10.1109/BigData.2018.8622172",
	isbn = "978-1-5386-5035-6",
	month = "dec",
	pages = "65--73",
	publisher = "IEEE",
	title = "{Linear Models with Many Cores and CPUs: A Stochastic Atomic Update Scheme}",
	url = "https://ieeexplore.ieee.org/document/8622172/",
	year = 2018,
	topic = "fast,malware"
}

Edward Raff. Neural Fingerprint Enhancement. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). December 2018, 118–124. URL, DOI BibTeX

@inproceedings{raff_nfe,
	author = "Raff, Edward",
	booktitle = "2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)",
	doi = "10.1109/ICMLA.2018.00025",
	month = "dec",
	pages = "118--124",
	title = "{Neural Fingerprint Enhancement}",
	year = 2018,
	url = "https://www.edwardraff.com/publications/Neural_Fingerprint_Enhancement.pdf",
	topic = "biometrics"
}

Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro and Charles Nicholas. Malware Detection by Eating a Whole EXE. In AAAI Workshop on Artificial Intelligence for Cyber Security. October 2018. URL BibTeX

@inproceedings{MalConv,
	archiveprefix = "arXiv",
	arxivid = "1710.09435",
	author = "Raff, Edward and Barker, Jon and Sylvester, Jared and Brandon, Robert and Catanzaro, Bryan and Nicholas, Charles",
	booktitle = "AAAI Workshop on Artificial Intelligence for Cyber Security",
	eprint = "1710.09435",
	month = "oct",
	title = "{Malware Detection by Eating a Whole EXE}",
	url = "http://arxiv.org/abs/1710.09435",
	year = 2018,
	topic = "malware,deep"
}

William Fleshman, Edward Raff, Richard Zak, Mark McLean and Charles Nicholas. {Static Malware Detection & Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus}. In 2018 13th International Conference on Malicious and Unwanted Software (MALWARE). October 2018, 1–10.
Best Paper Award. URL, DOI BibTeX

@inproceedings{Fleshman2018,
	author = "Fleshman, William and Raff, Edward and Zak, Richard and McLean, Mark and Nicholas, Charles",
	booktitle = "2018 13th International Conference on Malicious and Unwanted Software (MALWARE)",
	doi = "10.1109/MALWARE.2018.8659360",
	month = "oct",
	pages = "1--10",
	publisher = "IEEE",
	title = "{Static Malware Detection {\&} Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus}",
	url = "http://arxiv.org/abs/1806.04773 https://ieeexplore.ieee.org/document/8659360/",
	year = 2018,
	award = "Best Paper",
	note = "Best Paper Award",
	topic = "aml,malware"
}

Edward Raff and Charles K Nicholas. Lempel-Ziv Jaccard Distance, an effective alternative to ssdeep and sdhash. Digital Investigation, February 2018. URL, DOI BibTeX

@article{raff_lzjd_digest,
	archiveprefix = "arXiv",
	arxivid = "1708.03346",
	author = "Raff, Edward and Nicholas, Charles K.",
	doi = "10.1016/j.diin.2017.12.004",
	eprint = "1708.03346",
	issn = 17422876,
	journal = "Digital Investigation",
	mendeley-groups = "Digitial Forensics/approximate file search,{\_}inprogress",
	month = "feb",
	title = "{Lempel-Ziv Jaccard Distance, an effective alternative to ssdeep and sdhash}",
	url = "https://doi.org/10.1016/j.diin.2017.12.004",
	year = 2018,
	topic = "malware,digest,fast"
}

Edward Raff, Jared Sylvester and Steven Mills. Fair Forests: Regularized Tree Induction to Minimize Model Bias. In AAAI / ACM conference on Artificial Intelligence, Ethics, and Society. 2018. URL BibTeX

@inproceedings{fairForests,
	archiveprefix = "arXiv",
	arxivid = "1712.08197",
	author = "Raff, Edward and Sylvester, Jared and Mills, Steven",
	booktitle = "AAAI / ACM conference on Artificial Intelligence, Ethics, and Society",
	eprint = "1712.08197",
	mendeley-groups = "Machine Learning/fairness/classifiers/trees",
	title = "{Fair Forests: Regularized Tree Induction to Minimize Model Bias}",
	url = "http://arxiv.org/abs/1712.08197",
	year = 2018,
	topic = "fairness"
}

Edward Raff and Charles Nicholas. Toward Metric Indexes for Incremental Insertion and Querying. arXiv, 2018. URL BibTeX

@article{Raff2018_metric_index,
	archiveprefix = "arXiv",
	arxivid = "1801.05055",
	author = "Raff, Edward and Nicholas, Charles",
	eprint = "1801.05055",
	journal = "arXiv",
	keywords = "incremental,metric index,metric space,nearest neighbor,search",
	title = "{Toward Metric Indexes for Incremental Insertion and Querying}",
	url = "http://arxiv.org/abs/1801.05055",
	year = 2018,
	topic = "fast,knn"
}

Edward Raff and Charles Nicholas. Hash-Grams: Faster N-Gram Features for Classification and Malware Detection. In Proceedings of the ACM Symposium on Document Engineering 2018. 2018. URL, DOI BibTeX

@inproceedings{hashgram_2018,
	address = "Halifax, NS, Canada",
	author = "Raff, Edward and Nicholas, Charles",
	booktitle = "Proceedings of the ACM Symposium on Document Engineering 2018",
	doi = "10.1145/3209280.3229085",
	publisher = "ACM",
	title = "{Hash-Grams: Faster N-Gram Features for Classification and Malware Detection}",
	url = "http://doi.acm.org/10.1145/3209280.3229085",
	year = 2018,
	topic = "fast,ngram"
}

Edward Raff, Shannon Lantzy and Ezekiel Maier. Dr. AI, Where did you get your degree?. Proceedings of the First Joint Workshop on AI in Health organized as part of the Federated AI Meeting (FAIM 2018) 2142:204–207, 2018. URL BibTeX

@article{dr_ai_short,
	address = "Stockholm, Sweden",
	author = "Raff, Edward and Lantzy, Shannon and Maier, Ezekiel",
	journal = "Proceedings of the First Joint Workshop on AI in Health organized as part of the Federated AI Meeting (FAIM 2018)",
	keywords = "clinical applications,continuous learning,regulation",
	pages = "204--207",
	publisher = "CEUR Workshop Proceedings",
	title = "{Dr. AI, Where did you get your degree?}",
	url = "http://ceur-ws.org/Vol-2142/short11.pdf",
	volume = 2142,
	year = 2018,
	topic = "health,opinion"
}

Jared Sylvester and Edward Raff. What About Applied Fairness?. In Machine Learning: The Debates (ML-D) organized as part of the Federated AI Meeting (FAIM 2018). 2018. URL BibTeX

@inproceedings{applied_fairness_2018,
	archiveprefix = "arXiv",
	arxivid = "1806.05250",
	author = "Sylvester, Jared and Raff, Edward",
	booktitle = "Machine Learning: The Debates (ML-D) organized as part of the Federated AI Meeting (FAIM 2018)",
	eprint = "1806.05250",
	title = "{What About Applied Fairness?}",
	url = "http://arxiv.org/abs/1806.05250",
	year = 2018,
	topic = "fairness"
}

Edward Raff and Jared Sylvester. Gradient Reversal Against Discrimination. In Proceedings ofthe 5th Workshop on Fairness, Accountability and Transparency in Machine Learning. 2018. URL BibTeX

@inproceedings{grad_fair,
	address = "Stockholm, Sweden",
	archiveprefix = "arXiv",
	arxivid = "1807.00392",
	author = "Raff, Edward and Sylvester, Jared",
	booktitle = "Proceedings ofthe 5th Workshop on Fairness, Accountability and Transparency in Machine Learning",
	eprint = "1807.00392",
	title = "{Gradient Reversal Against Discrimination}",
	url = "http://arxiv.org/abs/1807.00392",
	year = 2018,
	topic = "fairness"
}

Edward Raff. Growing and Retaining AI Talent for the United States Government. In AAAI FSS-18: Artificial Intelligence in Government and Public Sector. 2018. URL BibTeX

@inproceedings{raff_ai_gov_retention,
	address = "Arlington, Virginia, United States",
	archiveprefix = "arXiv",
	arxivid = "arXiv:1809.10276v1",
	author = "Raff, Edward",
	booktitle = "AAAI FSS-18: Artificial Intelligence in Government and Public Sector",
	eprint = "arXiv:1809.10276v1",
	title = "{Growing and Retaining AI Talent for the United States Government}",
	url = "https://arxiv.org/abs/1809.10276",
	year = 2018,
	topic = "opinion"
}

Edward Raff, Jared Sylvester and Charles Nicholas. Engineering a Simplified 0-Bit Consistent Weighted Sampling. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018, 1203–1212. URL, DOI BibTeX

@inproceedings{scws_18,
	address = "New York, NY, USA",
	author = "Raff, Edward and Sylvester, Jared and Nicholas, Charles",
	booktitle = "Proceedings of the 27th ACM International Conference on Information and Knowledge Management",
	doi = "10.1145/3269206.3271690",
	isbn = "978-1-4503-6014-2",
	keywords = "consistent weighted sampling,jaccard similarity,min-hashing",
	pages = "1203--1212",
	publisher = "ACM",
	series = "CIKM '18",
	title = "{Engineering a Simplified 0-Bit Consistent Weighted Sampling}",
	url = "http://doi.acm.org/10.1145/3269206.3271690",
	year = 2018,
	topic = "fast"
}

Josh Sullivan, Josh Elliot, Kirsten Lloyd and Edward Raff. My Fair Data: How the Government Can Limit Bias in Artificial Intelligence. 2018. URL BibTeX

@misc{Sullivan2018,
	author = "Sullivan, Josh and Elliot, Josh and Lloyd, Kirsten and Raff, Edward",
	booktitle = "The Atlantic",
	title = "{My Fair Data: How the Government Can Limit Bias in Artificial Intelligence}",
	url = "https://www.theatlantic.com/sponsored/booz-allen-hamilton-2018/how-government-can-limit-bias-in-ai/1972/",
	year = 2018,
	topic = "opinion"
}

Richard Zak, Edward Raff and Charles Nicholas. What can N-grams learn for malware detection?. In 2017 12th International Conference on Malicious and Unwanted Software (MALWARE). October 2017, 109–118. URL, DOI BibTeX

@inproceedings{Zak2017,
	author = "Zak, Richard and Raff, Edward and Nicholas, Charles",
	booktitle = "2017 12th International Conference on Malicious and Unwanted Software (MALWARE)",
	doi = "10.1109/MALWARE.2017.8323963",
	isbn = "978-1-5386-1436-5",
	month = "oct",
	pages = "109--118",
	publisher = "IEEE",
	title = "{What can N-grams learn for malware detection?}",
	url = "http://ieeexplore.ieee.org/document/8323963/",
	year = 2017,
	topic = "malware,ngram"
}

Edward Raff. JSAT: Java Statistical Analysis Tool, a Library for Machine Learning. Journal of Machine Learning Research 18(23):1–5, 2017. URL BibTeX

@article{JMLR:v18:16-131,
	author = "Raff, Edward",
	journal = "Journal of Machine Learning Research",
	mendeley-groups = "Machine Learning/Library Papers",
	number = 23,
	pages = "1--5",
	title = "{JSAT: Java Statistical Analysis Tool, a Library for Machine Learning}",
	url = "http://jmlr.org/papers/v18/16-131.html",
	volume = 18,
	year = 2017,
	topic = "fast,code"
}

Edward Raff and Charles Nicholas. An Alternative to NCD for Large Sequences, Lempel-Ziv Jaccard Distance. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17. 2017, 1007–1015. URL, DOI BibTeX

@inproceedings{raff_lzjd_2017,
	address = "New York, New York, USA",
	author = "Raff, Edward and Nicholas, Charles",
	booktitle = "Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17",
	doi = "10.1145/3097983.3098111",
	isbn = 9781450348874,
	keywords = "cyber security,jaccard similarity,lempel-ziv,malware classification,normalized compression distance",
	mendeley-groups = "{\_}inprogress",
	pages = "1007--1015",
	publisher = "ACM Press",
	title = "{An Alternative to NCD for Large Sequences, Lempel-Ziv Jaccard Distance}",
	url = "http://dl.acm.org/citation.cfm?doid=3097983.3098111",
	year = 2017,
	topic = "malware,digest"
}

Edward Raff and Charles Nicholas. Malware Classification and Class Imbalance via Stochastic Hashed LZJD. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017, 111–120. URL, DOI BibTeX

@inproceedings{raff_shwel,
	address = "New York, NY, USA",
	author = "Raff, Edward and Nicholas, Charles",
	booktitle = "Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security",
	doi = "10.1145/3128572.3140446",
	isbn = "978-1-4503-5202-4",
	keywords = "cyber security,lzjd,malware classification,shwel",
	pages = "111--120",
	publisher = "ACM",
	series = "AISec '17",
	title = "{Malware Classification and Class Imbalance via Stochastic Hashed LZJD}",
	url = "http://doi.acm.org/10.1145/3128572.3140446",
	year = 2017,
	topic = "malware,fast"
}

Edward Raff, Jared Sylvester and Charles Nicholas. Learning the PE Header, Malware Detection with Minimal Domain Knowledge. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017, 121–132. URL, DOI BibTeX

@inproceedings{raff2017peheader,
	address = "New York, NY, USA",
	author = "Raff, Edward and Sylvester, Jared and Nicholas, Charles",
	booktitle = "Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security",
	doi = "10.1145/3128572.3140442",
	isbn = "978-1-4503-5202-4",
	keywords = "cyber security,deep learning,malware detection",
	pages = "121--132",
	publisher = "ACM",
	series = "AISec '17",
	title = "{Learning the PE Header, Malware Detection with Minimal Domain Knowledge}",
	url = "http://doi.acm.org/10.1145/3128572.3140442",
	year = 2017,
	topic = "malware"
}

Edward Raff, Richard Zak, Russell Cox, Jared Sylvester, Paul Yacci, Rebecca Ward, Anna Tracy, Mark McLean and Charles Nicholas. An investigation of byte n-gram features for malware classification. Journal of Computer Virology and Hacking Techniques, September 2016. URL, DOI BibTeX

@article{raff_ngram_2016,
	author = "Raff, Edward and Zak, Richard and Cox, Russell and Sylvester, Jared and Yacci, Paul and Ward, Rebecca and Tracy, Anna and McLean, Mark and Nicholas, Charles",
	doi = "10.1007/s11416-016-0283-1",
	issn = "2263-8733",
	journal = "Journal of Computer Virology and Hacking Techniques",
	keywords = "byte n-grams,elastic-net,malware classification,multi-byte identifier",
	mendeley-groups = "Me",
	month = "sep",
	title = "{An investigation of byte n-gram features for malware classification}",
	url = "http://link.springer.com/10.1007/s11416-016-0283-1",
	year = 2016,
	topic = "malware,ngram"
}

Publications

Main Menu