Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track
Carol Long, Hsiang Hsu, Wael Alghamdi, Flavio Calmon
Machine learning tasks may admit multiple competing models that achieve similar performance yet produce conflicting outputs for individual samples---a phenomenon known as predictive multiplicity. We demonstrate that fairness interventions in machine learning optimized solely for group fairness and accuracy can exacerbate predictive multiplicity. Consequently, state-of-the-art fairness interventions can mask high predictive multiplicity behind favorable group fairness and accuracy metrics. We argue that a third axis of ``arbitrariness'' should be considered when deploying models to aid decision-making in applications of individual-level impact.To address this challenge, we propose an ensemble algorithm applicable to any fairness intervention that provably ensures more consistent predictions.