The use of machine learning models in high-stake applications (e.g., healthcare, lending, college admission) has raised growing concerns due to potential biases against protected social groups. Various fairness notions and methods have been proposed to mitigate such biases. In this work, we focus on Counterfactual Fairness (CF), a fairness notion that is dependent on an underlying causal graph and first proposed by Kusner \textit{et al.}~\cite{kusner2017counterfactual}; it requires that the outcome an individual perceives is the same in the real world as it would be in a "counterfactual" world, in which the individual belongs to another social group. Learning fair models satisfying CF can be challenging. It was shown in \cite{kusner2017counterfactual} that a sufficient condition for satisfying CF is to \textbf{not} use features that are descendants of sensitive attributes in the causal graph. This implies a simple method that learns CF models only using non-descendants of sensitive attributes while eliminating all descendants. Although several subsequent works proposed methods that use all features for training CF models, there is no theoretical guarantee that they can satisfy CF. In contrast, this work proposes a new algorithm that trains models using all the available features. We theoretically and empirically show that models trained with this method can satisfy CF\footnote{The code repository for this work can be found in \url{https://github.com/osu-srml/CF_Representation_Learning}}.
翻译:在高风险应用(如医疗、贷款、大学录取)中使用机器学习模型,因其可能对受保护社会群体产生偏见而引发日益增长的关注。为减轻此类偏见,学界已提出多种公平性概念与方法。本研究聚焦于反事实公平性(CF)——这一由Kusner等人~\cite{kusner2017counterfactual}首次提出的、依赖于潜在因果图的公平性概念,要求个体在现实世界中感知到的结果,与其在假设属于另一社会群体的“反事实”世界中应得到的结果相同。满足CF的公平模型学习极具挑战性。文献~\cite{kusner2017counterfactual}指出,满足CF的充分条件是\textbf{不}使用因果图中敏感属性的后代特征,这意味着仅采用敏感属性的非后代特征并剔除所有后代特征的简单方法即可学习CF模型。虽然后续研究提出了利用全部特征训练CF模型的方法,但缺乏满足CF的理论保证。相比之下,本文提出一种新算法,可利用所有可用特征训练模型。我们从理论与实证两方面证明,该方法训练的模型能够满足CF\footnote{本工作代码仓库见\url{https://github.com/osu-srml/CF_Representation_Learning}}。