Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias

Model-induced distribution shifts (MIDS) occur as previous model outputs pollute new model training sets over generations of models. This is known as model collapse in the case of generative models, and performative prediction or unfairness feedback loops for supervised models. When a model induces a distribution shift, it also encodes its mistakes, biases, and unfairnesses into the ground truth of its data ecosystem. We introduce a framework that allows us to track multiple MIDS over many generations, finding that they can lead to loss in performance, fairness, and minoritized group representation, even in initially unbiased datasets. Despite these negative consequences, we identify how models might be used for positive, intentional, interventions in their data ecosystems, providing redress for historical discrimination through a framework called algorithmic reparation (AR). We simulate AR interventions by curating representative training batches for stochastic gradient descent to demonstrate how AR can improve upon the unfairnesses of models and data ecosystems subject to other MIDS. Our work takes an important step towards identifying, mitigating, and taking accountability for the unfair feedback loops enabled by the idea that ML systems are inherently neutral and objective.

翻译：模型诱导的分布偏移是指前代模型的输出污染了新一代模型的训练集，这种现象在生成模型领域被称为模型崩溃，而在监督学习模型领域则体现为表现性预测或不公平性反馈循环。当模型引发分布偏移时，其错误、偏见和不公平性会编码到数据生态系统的真实标注中。我们提出的框架能够追踪多代模型产生的多种分布偏移，发现即使初始数据集没有偏差，这些偏移仍会导致性能下降、公平性受损以及少数群体代表性降低。尽管存在这些负面影响，我们识别出模型可能被有意用于数据生态系统的正面干预，通过名为算法修复的框架为历史歧视提供补救方案。我们通过为随机梯度下降算法精心筛选具有代表性的训练批次来模拟算法修复干预措施，从而证明算法修复如何改善模型及受其他分布偏移影响的数据生态系统中的不公平现象。本研究朝着识别、缓解并追究因机器学习系统被默认视为中立客观而引发的不公平反馈循环责任迈出了重要一步。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/