Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we propose a principled framework that adjusts data representations to balance predictive utility and fairness. Using sufficient dimension reduction, we decompose the feature space into target-relevant, sensitive, and shared components, and control the fairness-utility trade-off by selectively removing sensitive information. We provide a theoretical analysis of how prediction error and fairness gaps evolve as shared subspaces are added, and employ influence functions to quantify their effects on the asymptotic behavior of parameter estimates. Experiments on both synthetic and real-world datasets validate our theoretical insights and show that the proposed method effectively improves fairness while preserving predictive performance.
翻译:机器学习模型已取得广泛应用的成功,但往往继承并放大了历史偏见,导致不公平的结果。传统的公平性方法通常在预测层面施加约束,而未解决数据表示中的潜在偏差。本文提出一种原则性框架,通过调整数据表示来平衡预测效用与公平性。利用充分降维技术,我们将特征空间分解为目标相关、敏感及共享分量,并通过选择性移除敏感信息来控制公平性与效用的权衡。我们理论分析了随着共享子空间的加入,预测误差与公平性差距如何演变,并运用影响函数量化它们对参数估计渐近行为的影响。在合成与真实数据集上的实验验证了我们的理论见解,表明所提方法在保持预测性能的同时有效提升了公平性。