We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.
翻译:我们提出一种用于数据预处理的提升算法以实现公平性。该方法从初始公平但不精确的分布出发,在确保最小公平性保证的同时向更优数据拟合方向演进。为此,算法以符合提升要求的收敛方式学习指数族的充分统计量。重要的是,我们能够从理论上证明,学习得到的分布将具备表示率与统计率层面的数据公平性保证。与近期基于优化的预处理方法不同,本方法可便捷地适应连续域特征。此外,当将弱学习器指定为决策树时,可通过分析学习分布的充分统计量来揭示(不)公平性的成因。实验结果展示了该方法在真实数据上的优良表现。