We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.
翻译:我们提出一种用于数据预处理的提升算法以实现公平性。该方法从初始公平但不够精确的分布出发,在确保最小公平性保障的同时逐步提升数据拟合能力。为此,它通过学习具有提升兼容收敛性的指数族充分统计量来实现目标。重要的是,我们从理论上证明了所学习到的分布将具备表示率与统计率层面的数据公平性保障。与近期基于优化的预处理方法不同,我们的方法可轻松适配连续域特征。此外,当弱学习器被指定为决策树时,可通过检验学习分布的充分统计量为(不)公平性成因提供线索。实验结果表明了该方法在真实世界数据上的有效性。