Date: Wednesday, April 13th, 2022
9:00 am – 10:00 am Pacific Time
12:00 pm – 1:00 pm Eastern Time
Location: Weekly Seminar, Zoom
Title: Reproducibility in Machine Learning
Reproducibility is vital to ensuring scientific conclusions are reliable, but failures of reproducibility have been a major issue in nearly all scientific areas of study in recent decades. A key issue underlying the reproducibility crisis is the explosion of methods for data generation, screening, testing, and analysis, where only the combinations producing the most significant results are reported. Such practices can lead to erroneous findings that appear to be significant, but don’t hold up when other researchers attempt to replicate them.
In this talk, we introduce a new notion of reproducibility for randomized algorithms. This notion ensures that with high probability, an algorithm returns exactly the same output when run with two samples from the same distribution. Despite the exceedingly strong demand of reproducibility, there are efficient reproducible algorithms for several fundamental problems in statistics and learning. We show that any statistical query algorithm can be made reproducible with a modest increase in sample complexity, and we use this to construct reproducible algorithms for finding approximate heavy-hitters and medians. Using these ideas, we give the first reproducible algorithm for learning halfspaces via a reproducible weak learner and a reproducible boosting algorithm. We initiate the study of lower bounds and inherent tradeoffs for reproducible algorithms, giving nearly tight sample complexity upper and lower bounds for reproducible SQ algorithms. Finally, we discuss connections to other well-studied notions of algorithmic stability, such as differential privacy.
This talk is based on joint work with Russell Impagliazzo (UCSD), Rex Lei (UCSD), and Toniann Pitassi (Columbia University), to appear in STOC 2022.
Jessica Sorrell is a PhD student at UC San Diego advised by Russell Impagliazzo and Daniele Micciancio. She is broadly interested in responsible computing, particularly questions related to fairness, privacy, robustness, and reproducibility. In the past she has worked on lattice-based cryptography, with a focus on secure computation. In the fall she will be joining the University of Pennsylvania as a postdoc with Aaron Roth.