This paper introduces the Fair Fairness Benchmark (\textsf{FFB}), a benchmarking framework for in-processing group fairness methods. Ensuring fairness in machine learning is important for ethical compliance. However, there exist challenges in comparing and developing fairness methods due to inconsistencies in experimental settings, lack of accessible algorithmic implementations, and limited extensibility of current fairness packages and tools. To address these issues, we introduce an open-source standardized benchmark for evaluating in-processing group fairness methods and provide a comprehensive analysis of state-of-the-art methods to ensure different notions of group fairness. This work offers the following key contributions: the provision of flexible, extensible, minimalistic, and research-oriented open-source code; the establishment of unified fairness method benchmarking pipelines; and extensive benchmarking, which yields key insights from $\mathbf{45,079}$ experiments, $\mathbf{14,428}$ GPU hours. We believe that our work will significantly facilitate the growth and development of the fairness research community.
翻译:本文介绍了公平公平性基准(\textsf{FFB}),一个用于组内处理公平性方法的基准测试框架。确保机器学习中的公平性对于符合伦理规范至关重要。然而,由于实验设置的不一致、缺乏易于获取的算法实现以及现有公平性工具包和工具的可扩展性有限,公平性方法的比较与发展面临诸多挑战。为解决这些问题,我们引入了一个用于评估组内处理公平性方法的开源标准化基准,并对最先进的方法进行了全面分析,以确保实现不同概念的组公平性。本工作提供了以下关键贡献:提供了灵活、可扩展、简约且面向研究的开源代码;建立了统一的公平性方法基准测试流程;以及广泛的基准测试,从 $\mathbf{45,079}$ 次实验和 $\mathbf{14,428}$ GPU 小时中获得了关键洞见。我们相信,我们的工作将极大地促进公平性研究社区的成长与发展。