Machine learning models often learn to make predictions that rely on sensitive social attributes like gender and race, which poses significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Existing work tackles this issue by minimizing the employed information about social attributes in models for debiasing. However, the high correlation between target task and these social attributes makes learning on the target task incompatible with debiasing. Given that model bias arises due to the learning of bias features (\emph{i.e}., gender) that help target task optimization, we explore the following research question: \emph{Can we leverage shortcut features to replace the role of bias feature in target task optimization for debiasing?} To this end, we propose \emph{Shortcut Debiasing}, to first transfer the target task's learning of bias attributes from bias features to shortcut features, and then employ causal intervention to eliminate shortcut features during inference. The key idea of \emph{Shortcut Debiasing} is to design controllable shortcut features to on one hand replace bias features in contributing to the target task during the training stage, and on the other hand be easily removed by intervention during the inference stage. This guarantees the learning of the target task does not hinder the elimination of bias features. We apply \emph{Shortcut Debiasing} to several benchmark datasets, and achieve significant improvements over the state-of-the-art debiasing methods in both accuracy and fairness.
翻译:机器学习模型常学习基于敏感社会属性(如性别和种族)进行预测,这在招聘、银行和刑事司法等社会应用中会带来显著的公平性风险。现有工作通过最小化模型中社会属性信息的使用来解决此问题。然而,目标任务与社会属性之间的高度相关性使得目标任务的学习与去偏难以兼容。考虑到模型偏差源于有助于目标任务优化的偏差特征(例如性别)的学习,我们探究以下研究问题:能否利用快捷特征替代偏差特征在目标任务优化中的作用以实现去偏?为此,我们提出“快捷去偏”方法,首先将目标任务对偏差属性的学习从偏差特征转移到快捷特征,再通过因果干预在推理阶段消除快捷特征。其核心思想是设计可控的快捷特征:一方面在训练阶段替代偏差特征对目标任务做出贡献,另一方面在推理阶段可通过干预轻松移除。这确保了目标任务的学习不会阻碍偏差特征的消除。我们在多个基准数据集上应用“快捷去偏”方法,在准确率和公平性方面均取得了显著优于现有最优去偏方法的改进。