Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation. Interpreting our findings, we recall a widely overlooked theoretical argument, present seven years ago, that accurately predicted what we observe.
翻译:七年前,研究者提出了一种后处理方法,旨在均衡模型在不同人口群体间的错误率。该工作引发了数百篇声称改进后处理基线的论文。我们通过在多个表格数据集上进行数千次模型评估,对这些主张进行了实证检验。研究发现,后处理方法实现的公平-准确率帕累托前沿包含了我们能够实际评估的所有其他方法。在此过程中,我们指出了此前观察中存在的两种常见方法论错误:其一是比较采用了不同无约束基模型的方法,其二是涉及达到不同约束松弛程度的方法。本研究的核心是一个名为"反后处理"的简单概念,它大致对应于后处理的逆过程。反后处理允许直接比较使用不同底层模型和松弛程度的方法。在解读我们的发现时,我们重提了一个七年前被广泛忽视的理论论证,该论证准确预测了当前观察到的现象。