As machine learning (ML) algorithms are used in applications that involve humans, concerns have arisen that these algorithms may be biased against certain social groups. \textit{Counterfactual fairness} (CF) is a fairness notion proposed in Kusner et al. (2017) that measures the unfairness of ML predictions; it requires that the prediction perceived by an individual in the real world has the same marginal distribution as it would be in a counterfactual world, in which the individual belongs to a different group. Although CF ensures fair ML predictions, it fails to consider the downstream effects of ML predictions on individuals. Since humans are strategic and often adapt their behaviors in response to the ML system, predictions that satisfy CF may not lead to a fair future outcome for the individuals. In this paper, we introduce \textit{lookahead counterfactual fairness} (LCF), a fairness notion accounting for the downstream effects of ML models which requires the individual \textit{future status} to be counterfactually fair. We theoretically identify conditions under which LCF can be satisfied and propose an algorithm based on the theorems. We also extend the concept to path-dependent fairness. Experiments on both synthetic and real data validate the proposed method.
翻译:随着机器学习算法在涉及人类的应用中被广泛使用,人们日益担忧这些算法可能对某些社会群体存在偏见。\textit{反事实公平性}是Kusner等人(2017)提出的一种公平性概念,用于衡量机器学习预测的不公平程度;它要求个体在现实世界中感知到的预测结果,与其在反事实世界(假设该个体属于不同群体)中的预测结果具有相同的边缘分布。尽管反事实公平性能确保机器学习预测的公平性,但它未能考虑预测对个体产生的下游影响。由于人类具有策略性,并经常根据机器学习系统的反馈调整自身行为,满足反事实公平性的预测未必能为个体带来公平的未来结果。本文提出\textit{前瞻性反事实公平性}这一新的公平性概念,该概念通过要求个体\textit{未来状态}满足反事实公平性,将机器学习模型的下游影响纳入考量。我们从理论上确定了满足前瞻性反事实公平性的条件,并基于相关定理提出实现算法。同时,我们将这一概念扩展至路径依赖公平性场景。通过合成数据与真实数据的实验,验证了所提方法的有效性。