Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In the present work, we aim at clarifying the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Despite the simplicity of the data model, we retrace and unpack typical unfairness behaviour observed on real-world datasets. We also obtain a detailed analytical characterisation of a class of bias mitigation strategies. We first consider a basic loss-reweighing scheme, which allows for an implicit minimisation of different unfairness metrics, and quantify the incompatibilities between some existing fairness criteria. Then, we consider a novel mitigation strategy based on a matched inference approach, consisting in the introduction of coupled learning models. Our theoretical analysis of this approach shows that the coupled strategy can strike superior fairness-accuracy trade-offs.
翻译:机器学习(ML)可能对人类偏见视而不见,但无法免疫于偏见的持续存在。边缘化和不公正的群体表征往往可追溯至用于训练的数据本身,并可能被学习模型反映甚至加剧。本研究旨在阐明数据几何在机器学习偏见产生中的作用。我们引入了一个精确可解的高维数据不平衡模型,其中对多种偏见诱发因素的参数化控制使得对偏见继承机制的广泛探索成为可能。通过统计物理工具,我们解析表征了在此合成框架下训练的学习模型的典型特性,并获得了常用于公平性评估的可观测量的精确预测。尽管数据模型简洁,我们重现并解析了在真实世界数据集中观察到的典型不公平行为。我们还对一类偏见缓解策略进行了详细的解析刻画。首先,考虑了一种基本的损失重新加权方案,该方案允许对不同不公平度量的隐式最小化,并量化了现有某些公平性标准之间的不兼容性。随后,我们基于匹配推断方法提出了一种新颖的缓解策略,该方法引入耦合学习模型。该方法的理论分析表明,耦合策略能够实现更优的公平性-准确性权衡。