Machine learning tasks may admit multiple competing models that achieve similar performance yet produce conflicting outputs for individual samples -- a phenomenon known as predictive multiplicity. We demonstrate that fairness interventions in machine learning optimized solely for group fairness and accuracy can exacerbate predictive multiplicity. Consequently, state-of-the-art fairness interventions can mask high predictive multiplicity behind favorable group fairness and accuracy metrics. We argue that a third axis of ``arbitrariness'' should be considered when deploying models to aid decision-making in applications of individual-level impact. To address this challenge, we propose an ensemble algorithm applicable to any fairness intervention that provably ensures more consistent predictions.
翻译:机器学习任务可能存在多个相互竞争的模型,这些模型在达到相似性能的同时,却对单个样本产生冲突的输出——这一现象被称为预测多重性。我们证明,仅针对群体公平性和准确性进行优化的机器学习公平性干预措施,可能会加剧预测多重性。因此,最先进的公平性干预措施可能在良好的群体公平性和准确性指标背后掩盖了高度的预测多重性。我们主张,在部署模型以辅助具有个体层面影响的决策应用时,应考虑“任意性”这第三个维度。为应对这一挑战,我们提出了一种适用于任何公平性干预措施的集成算法,该算法被证明能够确保更一致的预测结果。