As AI-based decision systems proliferate, their successful operationalization requires balancing multiple desiderata: predictive performance, disparity across groups, safeguarding sensitive group attributes (e.g., race), and engineering cost. We present a holistic framework for evaluating and contextualizing fairness interventions with respect to the above desiderata. The two key points of practical consideration are where (pre-, in-, post-processing) and how (in what way the sensitive group data is used) the intervention is introduced. We demonstrate our framework using a thorough benchmarking study on predictive parity; we study close to 400 methodological variations across two major model types (XGBoost vs. Neural Net) and ten datasets. Methodological insights derived from our empirical study inform the practical design of ML workflow with fairness as a central concern. We find predictive parity is difficult to achieve without using group data, and despite requiring group data during model training (but not inference), distributionally robust methods provide significant Pareto improvement. Moreover, a plain XGBoost model often Pareto-dominates neural networks with fairness interventions, highlighting the importance of model inductive bias.
翻译:随着基于AI的决策系统日益普及,其成功部署需要在多个目标之间取得平衡:预测性能、群体差异、敏感群体属性(如种族)的保护以及工程成本。我们提出一个整体框架,用于评估并结合上述目标对公平性干预进行情境化分析。实践中的两个关键考量点是干预的"位置"(前处理、中处理、后处理)和"方式"(敏感群体数据的使用途径)。我们通过一项针对预测均等性的全面基准研究来展示该框架;研究了横跨两个主要模型类型(XGBoost对比神经网络)和十个数据集的近400种方法变体。实证研究得出的方法论见解为以公平性为核心关注的机器学习工作流实践设计提供依据。我们发现,在不使用群体数据的情况下难以实现预测均等性,且尽管在模型训练(而非推理)阶段需要群体数据,分布鲁棒方法仍能提供显著的帕累托改进。此外,普通XGBoost模型通常帕累托支配经过公平性干预的神经网络,凸显了模型归纳偏置的重要性。