In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e.g., Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.
翻译:在机器学习公平性研究中,旨在最小化不同敏感群体间差异的模型训练往往导致准确率下降,这一现象被称为公平性-准确率权衡。该权衡的严重程度本质上取决于数据集特征,例如数据集不平衡或偏差,因此在不同数据集上采用统一的公平性要求仍值得商榷。为解决此问题,我们提出一种计算高效的方法,用于近似拟合针对特定数据集的公平性-准确率权衡曲线,并辅以严格的统计保证。通过利用"仅需训练一次"(YOTO)框架,我们的方法减轻了在近似权衡曲线时需要训练多个模型的计算负担。关键的是,我们引入了一种量化估计不确定性的新方法,从而为实践者提供了一个鲁棒的模型公平性审计框架,避免因估计误差得出错误结论。我们在表格数据(如Adult)、图像数据(CelebA)和语言数据(Jigsaw)上的实验表明,我们的方法不仅能可靠地量化跨多种数据模态的最优可达成权衡,还有助于检测当前最先进公平性方法中的次优性。