Fairness is essential for machine learning systems deployed in high-stake applications. Among all fairness notions, individual fairness, deriving from a consensus that `similar individuals should be treated similarly,' is a vital notion to describe fair treatment for individual cases. Previous studies typically characterize individual fairness as a prediction-invariant problem when perturbing sensitive attributes on samples, and solve it by Distributionally Robust Optimization (DRO) paradigm. However, such adversarial perturbations along a direction covering sensitive information used in DRO do not consider the inherent feature correlations or innate data constraints, therefore could mislead the model to optimize at off-manifold and unrealistic samples. In light of this drawback, in this paper, we propose to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness. These generated on-manifold antidote data can be used through a generic optimization procedure along with original training data, resulting in a pure pre-processing approach to individual unfairness, or can also fit well with the in-processing DRO paradigm. Through extensive experiments on multiple tabular datasets, we demonstrate our method resists individual unfairness at a minimal or zero cost to predictive utility compared to baselines.
翻译:公平性对于部署在高风险应用中的机器学习系统至关重要。在所有公平性概念中,个体公平性源于“相似个体应被相似对待”的共识,是描述个体案例公平处理的关键概念。以往研究通常将个体公平性表征为在样本上扰动敏感属性时的预测不变性问题,并通过分布鲁棒优化范式解决。然而,此类沿敏感信息方向进行的对抗性扰动未考虑固有特征相关性或内在数据约束,可能导致模型针对流形外及不现实样本进行优化。基于此缺陷,本文提出学习并生成近似遵循数据分布的“解毒数据”以修复个体不公平性。这些生成的流形内解毒数据可通过通用优化程序与原始训练数据共同使用,形成针对个体不公平性的纯预处理方法,也可与处理阶段中的分布鲁棒优化范式良好结合。通过在多个表格数据集上的广泛实验,我们证明该方法在保持预测效用的前提下,能以最小或零成本有效抵抗个体不公平性,优于基线方法。