We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. Specifically, we show that by decreasing the statistical distance between each group's score distributions, we can increase fair performance across all thresholds at once, and that we can do so without a significant decrease in accuracy. To this end, we introduce a formal measure of distributional parity, which captures the degree of similarity in the distributions of classifications for different protected groups. In contrast to prior work, which has been limited to studies of demographic parity across all thresholds, our measure applies to a large class of fairness metrics. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes distributional parity. We support this result with experiments on several fairness benchmarks.
翻译:我们研究对监督机器学习回归器进行后处理的问题,以在任意决策阈值下最大化公平二分类。具体而言,我们证明通过减小各组评分分布之间的统计距离,可以同时提升所有阈值下的公平性能,且不会显著降低准确率。为此,我们引入一种形式化的分布一致性度量,用于刻画不同受保护组分类分布的相似程度。与先前仅限于研究所有阈值下人口统计学一致性的工作不同,我们的度量适用于一大类公平性指标。主要成果是基于最优传输提出一种新颖的后处理算法,该算法可证明最大化分布一致性。我们通过在多个公平性基准上的实验验证了这一结论。