Learning for Counterfactual Fairness from Observational Data

Fairness-aware machine learning has attracted a surge of attention in many domains, such as online advertising, personalized recommendation, and social media analysis in web applications. Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age. Among many existing fairness notions, counterfactual fairness is a popular notion defined from a causal perspective. It measures the fairness of a predictor by comparing the prediction of each individual in the original world and that in the counterfactual worlds in which the value of the sensitive attribute is modified. A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data. However, in real-world scenarios, the underlying causal model is often unknown, and acquiring such human knowledge could be very difficult. In these scenarios, it is risky to directly trust the causal models obtained from information sources with unknown reliability and even causal discovery methods, as incorrect causal models can consequently bring biases to the predictor and lead to unfair predictions. In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE. Specifically, under certain general assumptions, CLAIRE effectively mitigates the biases from the sensitive attribute with a representation learning framework based on counterfactual data augmentation and an invariant penalty. Experiments conducted on both synthetic and real-world datasets validate the superiority of CLAIRE in both counterfactual fairness and prediction performance.

翻译：公平感知机器学习已在在线广告、个性化推荐及网络应用中的社交媒体分析等众多领域引发广泛关注。其目标在于消除学习模型对由种族、性别、年龄等受保护（敏感）属性界定的特定子群体存在的偏见。在现有多种公平性概念中，反事实公平性作为基于因果视角的流行定义，通过比较每个个体在原始世界与敏感属性值被修改的反事实世界中的预测结果，来衡量预测器的公平性。现有方法实现反事实公平性的前提是具备数据因果模型的人类先验知识。然而现实场景中，潜在因果模型往往未知，获取此类人类知识可能极为困难。在此情况下，直接信赖来自可靠性未知信息源甚至因果发现方法所获得的因果模型存在风险——错误的因果模型可能为预测器引入偏差，导致不公预测。本文提出新型框架CLAIRE，旨在无给定因果模型条件下，基于观测数据解决反事实公平预测问题。具体而言，在特定通用假设下，CLAIRE通过基于反事实数据增强的表征学习框架与不变性惩罚项，有效缓解敏感属性带来的偏见。在合成数据集与真实世界数据集上的实验结果均验证了CLAIRE在反事实公平性与预测性能方面的优越性。