Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can be applied to similarly prevent or reduce fairness hacking. The second category of fairness hacking is inter-metric fairness hacking. Inter-metric fairness hacking is the search for a specific fair metric with given attributes. We argue that countermeasures to prevent or reduce inter-metric fairness hacking are still in their infancy. Finally, we demonstrate both types of fairness hacking using real datasets. Our paper intends to serve as a guidance for discussions within the fair ML community to prevent or reduce the misuse of fairness metrics, and thus reduce overall harm from ML applications.
翻译:机器学习中的公平性是一个不断发展的研究领域,因为算法歧视可能造成多种潜在危害。为防止此类危害,大量研究提出了量化公平性的新方法。本文探讨如何通过描述一种我们称为"公平性操纵"的实践来转移公平性的量化评估,其目的是在算法中隐藏不公平性。这种做法会影响依赖学习算法的终端用户,以及关注公平AI实践的更广泛学界。我们借鉴p值操纵这一既定概念,提出了两种不同类型的公平性操纵。第一类为度量内公平性操纵,描述通过从分析中添加或移除敏感属性来滥用特定度量指标的行为。在此背景下,为预防或减少p值操纵而开发的应对措施同样适用于预防或减少公平性操纵。第二类公平性操纵是度量间公平性操纵,指搜索具有特定属性的公平度量指标。我们认为,预防或减少度量间公平性操纵的应对措施仍处于起步阶段。最后,我们使用真实数据集演示了这两种公平性操纵。本文旨在为公平机器学习学界的讨论提供指导,以预防或减少公平性度量的滥用,从而降低机器学习应用的整体危害。