Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we propose a principled framework that adjusts data representations to balance predictive utility and fairness. Using sufficient dimension reduction, we decompose the feature space into target-relevant, sensitive, and shared components, and control the fairness-utility trade-off by selectively removing sensitive information. We provide a theoretical analysis of how prediction error and fairness gaps evolve as shared subspaces are added, and employ influence functions to quantify their effects on the asymptotic behavior of parameter estimates. Experiments on both synthetic and real-world datasets validate our theoretical insights and show that the proposed method effectively improves fairness while preserving predictive performance.
翻译:机器学习模型已取得广泛成功,但常常继承并放大了历史偏见,导致不公平的结果。传统的公平性方法通常在预测层面施加约束,而未解决数据表征中的潜在偏差。本研究提出一种原则性框架,通过调整数据表征来平衡预测效用与公平性。利用充分降维技术,我们将特征空间分解为目标相关、敏感属性相关及共享分量,并通过选择性移除敏感信息来控制公平性与效用的权衡。我们理论分析了当共享子空间被逐步引入时预测误差与公平性差距的演变规律,并运用影响函数量化它们对参数估计渐近行为的影响。在合成数据集和真实数据集上的实验验证了我们的理论见解,并表明所提方法能在保持预测性能的同时有效提升公平性。