Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation.
翻译:七年前,研究者提出了一种后处理方法,旨在使模型在不同人口群体间的错误率趋于均衡。这一工作催生了数百篇宣称超越后处理基线的论文。我们通过数千次模型评估,在多个表格数据集上对这些主张进行了实证检验。结果表明,后处理方法所实现的公平-准确率帕累托前沿,包含了我们所有可实际评估的其他方法。在此过程中,我们解决了两个此前困扰观察结论的方法论常见错误:其一是涉及不同无约束基模型的比较,其二是涉及实现不同约束松弛程度的方法。本研究的核心在于一个简单概念——我们称之为"逆后处理",大致对应于后处理的逆操作。逆后处理使得采用不同底层模型与松弛程度的直接方法比较成为可能。