Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In the present work, we aim at clarifying the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Despite the simplicity of the data model, we retrace and unpack typical unfairness behaviour observed on real-world datasets. We also obtain a detailed analytical characterisation of a class of bias mitigation strategies. We first consider a basic loss-reweighing scheme, which allows for an implicit minimisation of different unfairness metrics, and quantify the incompatibilities between some existing fairness criteria. Then, we consider a novel mitigation strategy based on a matched inference approach, consisting in the introduction of coupled learning models. Our theoretical analysis of this approach shows that the coupled strategy can strike superior fairness-accuracy trade-offs.
翻译:机器学习(ML)可能对人类偏见无意识,但无法豁免其延续。边缘化和不平等的群体表征往往可追溯至用于训练的数据本身,并可能被学习模型反映甚至强化。本研究旨在阐明数据几何在ML偏见产生中的作用。我们引入了一个可精确求解的高维数据不平衡模型,其中对多种偏见诱导因素进行参数化控制,从而能广泛探索偏见继承机制。通过统计物理学工具,我们解析性地刻画了在此合成框架下训练的学习模型的典型特性,并对常用于公平性评估的可观测变量获得精确预测。尽管数据模型简单,我们仍能复现并解析真实数据集上观察到的不公平行为典型模式。我们还获得了一类偏见缓解策略的详细解析表征。首先考虑一种基本的损失重加权方案,该方案允许隐式最小化不同不公平度量,并量化了现有公平性标准之间的不兼容性。随后,提出一种基于匹配推理方法的新型缓解策略,即引入耦合学习模型。对该策略的理论分析表明,耦合策略能够实现更优的公平性-准确性权衡。