Date: Wednesday, May 3rd, 2023
9:00 am – 10:00 am Pacific Time
12:00 pm – 1:00 pm Eastern Time
Location: Weekly Seminar, Zoom
Title: Post-hoc Explanations: Unifications, Robustness, and Disagreements
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post-hoc manner. In the first part of the talk, we analyze two popular post-hoc explanations: SmoothGrad which is gradient-based, and a variant of LIME which is perturbation based. We show that both methods converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expectation.
Moreover, it has been observed that the explanations output by post-hoc explanations can disagree with each other. In the second part of the talk, we introduce and study the disagreement problem in explainable machine learning by formalizing the notion of disagreement between explanations and analyzing how often such disagreements occur in practice. To this end, we first conduct interviews with data scientists to understand what constitutes disagreement between explanations and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis of 6 state-of-the-art post-hoc explanation methods (including SmoothGrad and LIME) to measure the extent of disagreement between the explanations generated by these methods. Finally, we carry out an online user study with data scientists to understand how they resolve disagreements in their day-to-day jobs. Our results indicate that state-of-the-art explanation methods often disagree significantly and underscore the importance of developing principled evaluation metrics that enable practitioners to effectively compare explanations.
This is joint work with Chirag Agarwal, Sushant Agarwal, Alex Gu, Tessa Han, Satya Krishna, Hima Lakkaraju, Javin Pombra, Sohini Upadhyay and Steven Wu and based on the following two papers: (1) https://arxiv.org/abs/2102.10618 (ICML 2021) and (2) https://arxiv.org/abs/2202.01602 (Working Paper)
Shahin Jabbari is an assistant professor in the Computer Science Department of the College of Computing & Informatics at Drexel University. Shahin received his Ph.D. in Computer and Information Science from the University of Pennsylvania and was a CRCS postdoctoral fellow in the Computer Science Department of Harvard University for the next two years. Shahin’s research interests span areas in machine learning and game theory with a focus on societal aspects of algorithmic decision-making.