To address the bias exhibited by machine learning models, fairness criteria impose statistical constraints for ensuring equal treatment to all demographic groups, but typically at a cost to model performance. Understanding this tradeoff, therefore, underlies the design of fair and effective algorithms. This paper completes the characterization of the inherent tradeoff of demographic parity on classification problems in the most general multigroup, multiclass, and noisy setting. Specifically, we show that the minimum error rate is given by the optimal value of a Wasserstein-barycenter problem. More practically, this reformulation leads to a simple procedure for post-processing any pre-trained predictors to satisfy demographic parity in the general setting, which, in particular, yields the optimal fair classifier when applied to the Bayes predictor. We provide suboptimality and finite sample analyses for our procedure, and demonstrate precise control of the tradeoff of error rate for fairness on real-world datasets provided sufficient data.
翻译:为解决机器学习模型表现出的偏差问题,公平性准则通过施加统计约束来确保对所有人口群体一视同仁,但这通常以牺牲模型性能为代价。因此,理解这一权衡关系是设计公平且高效算法的核心。本文完整刻画了在最具一般性的多群体、多类别及含噪设定下,分类问题中人口统计均等性所固有的权衡关系。具体而言,我们证明最小错误率由Wasserstein重心问题的最优值给出。更实用的是,这一重构形式引出了一套简单流程,可在一般设定下对任意预训练预测器进行后处理以满足人口统计均等性,当应用于贝叶斯预测器时,该方法能直接得到最优公平分类器。我们提供了该流程的次优性分析与有限样本分析,并在真实数据集上证明,当数据量充足时,该方法能精确控制错误率与公平性之间的权衡。