Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In the present work, we aim at clarifying the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Despite the simplicity of the data model, we retrace and unpack typical unfairness behaviour observed on real-world datasets. We also obtain a detailed analytical characterisation of a class of bias mitigation strategies. We first consider a basic loss-reweighing scheme, which allows for an implicit minimisation of different unfairness metrics, and quantify the incompatibilities between some existing fairness criteria. Then, we consider a novel mitigation strategy based on a matched inference approach, consisting in the introduction of coupled learning models. Our theoretical analysis of this approach shows that the coupled strategy can strike superior fairness-accuracy trade-offs.
翻译:机器学习(ML)可能对人类偏见无意识,但并非免疫于其延续。边缘化和不公正的群体表征往往可追溯至训练数据本身,并可能被学习模型反映甚至放大。在本工作中,我们旨在阐明数据几何结构在ML偏见产生中所起的作用。我们引入了一个可精确求解的高维数据不平衡模型,通过对众多偏见诱导因素的参数化控制,实现了对偏见继承机制的广泛探索。借助统计物理学的工具,我们解析地刻画了在该合成框架下训练的学习模型的典型性质,并对常用于公平性评估的观测量给出了精确预测。尽管数据模型简单,我们仍复现并解析了在现实数据集中观察到的典型不公平行为。我们还获得了一类偏见缓解策略的详细解析刻画。首先考虑一种基础损失重加权方案,该方案允许隐式最小化不同的不公平性度量,并量化了某些现有公平性准则之间的不相容性。随后,我们提出一种基于匹配推断方法的新型缓解策略,通过引入耦合学习模型实现。对该策略的理论分析表明,耦合方法能够实现更优的公平性-准确性权衡。