Fairness is essential for machine learning systems deployed in high-stake applications. Among all fairness notions, individual fairness, deriving from a consensus that `similar individuals should be treated similarly,' is a vital notion to describe fair treatment for individual cases. Previous studies typically characterize individual fairness as a prediction-invariant problem when perturbing sensitive attributes on samples, and solve it by Distributionally Robust Optimization (DRO) paradigm. However, such adversarial perturbations along a direction covering sensitive information used in DRO do not consider the inherent feature correlations or innate data constraints, therefore could mislead the model to optimize at off-manifold and unrealistic samples. In light of this drawback, in this paper, we propose to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness. These generated on-manifold antidote data can be used through a generic optimization procedure along with original training data, resulting in a pure pre-processing approach to individual unfairness, or can also fit well with the in-processing DRO paradigm. Through extensive experiments on multiple tabular datasets, we demonstrate our method resists individual unfairness at a minimal or zero cost to predictive utility compared to baselines.
翻译:公平性对于部署在高风险应用中的机器学习系统至关重要。在所有公平概念中,个体公平源自“相似个体应受到相似对待”这一共识,是描述单个案例公平对待的核心概念。以往研究通常将个体公平刻画为在样本上扰动敏感属性时的预测不变性问题,并通过分布鲁棒优化(DRO)范式解决。然而,这种沿着包含敏感信息的方向进行的对抗性扰动并未考虑固有的特征相关性或内在数据约束,因此可能误导模型在非流形和不可实现的样本上进行优化。鉴于此缺陷,本文提出学习并生成近似遵循数据分布的解毒数据以修复个体不公平性。这些生成的流形内解毒数据可通过通用优化过程与原始训练数据共同使用,形成针对个体不公平的纯预处理方法,也可与处理中的DRO范式良好结合。通过在多个表格数据集上的广泛实验,我们证明了与基线相比,该方法在最小化或零预测效用的代价下有效抵抗了个体不公平性。