Most works on the fairness of machine learning systems focus on the blind optimization of common fairness metrics, such as Demographic Parity and Equalized Odds. In this paper, we conduct a comparative study of several bias mitigation approaches to investigate their behaviors at a fine grain, the prediction level. Our objective is to characterize the differences between fair models obtained with different approaches. With comparable performances in fairness and accuracy, are the different bias mitigation approaches impacting a similar number of individuals? Do they mitigate bias in a similar way? Do they affect the same individuals when debiasing a model? Our findings show that bias mitigation approaches differ a lot in their strategies, both in the number of impacted individuals and the populations targeted. More surprisingly, we show these results even apply for several runs of the same mitigation approach. These findings raise questions about the limitations of the current group fairness metrics, as well as the arbitrariness, hence unfairness, of the whole debiasing process.
翻译:大多数关于机器学习系统公平性的研究都聚焦于常见公平指标的盲目优化,例如人口统计均等和赔率均等。本文对几种偏见缓解方法进行了比较研究,以在预测层面这一细粒度上探究其行为。我们的目标是刻画采用不同方法获得的公平模型之间的差异。当公平性和准确性表现相当时,不同的偏见缓解方法是否影响相似数量的个体?它们是否以类似的方式减轻偏见?在去偏模型时,它们是否影响相同的个体?我们的发现表明,偏见缓解方法在策略上差异很大,无论是受影响个体的数量还是针对的群体。更令人惊讶的是,我们展示出这些结果甚至适用于同一缓解方法的多次运行。这些发现引发了关于当前群体公平性指标局限性,以及整个去偏过程的任意性进而导致不公的质疑。