FairlyUncertain: A Comprehensive Benchmark of Uncertainty in Algorithmic Fairness

Fair predictive algorithms hinge on both equality and trust, yet inherent uncertainty in real-world data challenges our ability to make consistent, fair, and calibrated decisions. While fairly managing predictive error has been extensively explored, some recent work has begun to address the challenge of fairly accounting for irreducible prediction uncertainty. However, a clear taxonomy and well-specified objectives for integrating uncertainty into fairness remains undefined. We address this gap by introducing FairlyUncertain, an axiomatic benchmark for evaluating uncertainty estimates in fairness. Our benchmark posits that fair predictive uncertainty estimates should be consistent across learning pipelines and calibrated to observed randomness. Through extensive experiments on ten popular fairness datasets, our evaluation reveals: (1) A theoretically justified and simple method for estimating uncertainty in binary settings is more consistent and calibrated than prior work; (2) Abstaining from binary predictions, even with improved uncertainty estimates, reduces error but does not alleviate outcome imbalances between demographic groups; (3) Incorporating consistent and calibrated uncertainty estimates in regression tasks improves fairness without any explicit fairness interventions. Additionally, our benchmark package is designed to be extensible and open-source, to grow with the field. By providing a standardized framework for assessing the interplay between uncertainty and fairness, FairlyUncertain paves the way for more equitable and trustworthy machine learning practices.

翻译：公平预测算法依赖于平等与信任，然而现实世界数据中固有的不确定性挑战着我们做出一致、公平且校准良好的决策的能力。尽管公平管理预测误差已得到广泛探索，但近期一些研究开始着手应对不可约减预测不确定性的公平量化挑战。然而，关于将不确定性纳入公平性框架的清晰分类体系与明确目标仍未确立。我们通过提出FairlyUncertain——一个用于评估公平性中不确定性估计的公理化基准——来填补这一空白。该基准主张，公平的预测不确定性估计应在不同学习流程间保持一致性，并与观测到的随机性相校准。通过对十个常用公平性数据集的广泛实验，我们的评估表明：（1）在二分类场景中，一种理论依据充分且简洁的不确定性估计方法比先前研究具有更好的一致性与校准性；（2）即使改进了不确定性估计，对二分类预测采取弃权策略虽能降低错误率，却无法缓解人口统计群体间的结果不平衡；（3）在回归任务中引入一致且校准良好的不确定性估计，无需任何显式的公平性干预即可提升公平性。此外，我们的基准工具包设计为可扩展的开源系统，旨在伴随该领域共同发展。通过提供评估不确定性与公平性相互作用的标准化框架，FairlyUncertain为建立更公平、更可信的机器学习实践铺平了道路。