With an increased focus on incorporating fairness in machine learning models, it becomes imperative not only to assess and mitigate bias at each stage of the machine learning pipeline but also to understand the downstream impacts of bias across stages. Here we consider a general, but realistic, scenario in which a predictive model is learned from (potentially biased) training data, and model predictions are assessed post-hoc for fairness by some auditing method. We provide a theoretical analysis of how a specific form of data bias, differential sampling bias, propagates from the data stage to the prediction stage. Unlike prior work, we evaluate the downstream impacts of data biases quantitatively rather than qualitatively and prove theoretical guarantees for detection. Under reasonable assumptions, we quantify how the amount of bias in the model predictions varies as a function of the amount of differential sampling bias in the data, and at what point this bias becomes provably detectable by the auditor. Through experiments on two criminal justice datasets -- the well-known COMPAS dataset and historical data from NYPD's stop and frisk policy -- we demonstrate that the theoretical results hold in practice even when our assumptions are relaxed.
翻译:随着机器学习模型公平性关注度的提升,不仅需要评估和缓解机器学习各阶段的偏差,还需理解偏差在跨阶段中的下游影响。本文考虑一个通用且现实的场景:预测模型从(可能存在偏差的)训练数据中学习,并通过某种审计方法事后评估模型预测的公平性。我们提供了一个理论分析框架,阐明一种特定数据偏差形式——差异采样偏差——如何从数据阶段传播至预测阶段。与以往研究不同,我们以定量而非定性方式评估数据偏差的下游影响,并证明其可检测性的理论保证。在合理假设下,我们量化了模型预测中偏差量如何随数据中差异采样偏差量的变化而变化,并确定审计方在何种阈值下可证明地检测到该偏差。通过对两个刑事司法数据集(著名的COMPAS数据集及纽约警察局拦截盘查政策的历史数据)的实验,证明即便放松假设条件,理论结果在实际中依然成立。