How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.

翻译：随着机器学习在高风险决策中的引入，确保算法公平性已成为日益重要且亟待解决的问题。为此，学界提出了多种公平性的数学定义，并开发了各类优化技术，均旨在最大化既定的公平性度量。然而，公平性解决方案依赖于训练数据的质量，且对噪声高度敏感。近期研究表明，鲁棒性（模型在未见数据上表现良好的能力）在针对新问题应采用的策略类型中起着重要作用，因此，衡量这些策略的鲁棒性已成为一个基础性问题。本文中，我们提出了一种衡量各类公平性优化策略鲁棒性的新标准——鲁棒性比率。我们在五个基准公平性数据集上，针对四种最常用的公平性定义，对三种最主流的公平性策略进行了多项大规模实验。实验结果表明，尽管基于阈值优化的公平性方法在多数情况下优于其他方法，但在所有评估数据集中对噪声都极为敏感。相比之下，另外两种方法在低噪声场景下公平性较低，但在高噪声场景下却表现出更高的公平性。据我们所知，本研究首次对公平性优化策略的鲁棒性进行了定量评估。这有望为不同数据集选择最合适的公平性策略提供指导依据。