On Testing and Comparing Fair classifiers under Data Bias

In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We theoretically and empirically study its effect on the accuracy and fairness of fair classifiers. Theoretically, we prove that the Bayes optimal group-aware fair classifier on the original data distribution can be recovered by simply minimizing a carefully chosen reweighed loss on the bias-injected distribution. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: (1) The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases, (2) A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and (3) A few, simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments

翻译：摘要：本文考虑了一个用于注入数据偏差的理论模型，即代表性不足偏差和标签偏差（Blum & Stangl, 2019）。我们从理论和实证两个角度研究了其对公平分类器准确性与公平性的影响。理论上，我们证明，在原始数据分布上的贝叶斯最优组感知公平分类器，可以通过在偏差注入分布上对精心选择的重新加权损失进行简单最小化而恢复。通过在合成数据集和真实世界数据集（如 Adult、German Credit、Bank Marketing、COMPAS）上的广泛实验，我们通过在其训练数据（而非测试数据）中注入不同量的代表性不足偏差和标签偏差，对来自标准公平工具包的预处理、处理中和后处理公平分类器进行了公平性与准确性的实证审计。主要观测结果如下：（1）许多标准公平分类器的公平性与准确性随着训练数据中注入偏差的增加而严重下降；（2）在正确数据上训练的简单逻辑回归模型，其准确性与公平性往往优于大多数在偏差训练数据上训练的公平分类器；（3）少数简单的公平性技术（如重新加权、指数梯度法）即使在训练数据被注入代表性不足偏差和标签偏差时，似乎仍能提供稳定的准确性与公平性保障。我们的实验还展示了如何将数据偏差风险的度量整合到现有的公平性仪表盘中，以用于实际部署。