Inducing Group Fairness in Prompt-Based Language Model Decisions

Classifiers are used throughout industry to enforce policies, ranging from the detection of toxic content to age-appropriate content filtering. While these classifiers serve important functions, it is also essential that they are built in ways that minimize unfair biases for users. One such fairness consideration is called group fairness, which desires that different sub-population of users receive equal treatment. This is a well-studied problem in the context of 'classical' classifiers. However, the emergence of prompt-based language model (LM) decision making has created new opportunities to solve text-based classification tasks, and the fairness properties of these new classifiers are not yet well understood. Further, the `remediation toolkit' is incomplete for LM-based decision makers and little is understood about how to improve decision maker group fairness while maintaining classifier performance. This work sets out to add more tools to that toolbox. We introduce adaptations of existing effective approaches from the classical classifier fairness to the prompt-based classifier space. We also devise simple methods that take advantage of the new structure of prompt-based decision makers and operate at the prompt level. We compare these approaches empirically on real data. Our results suggest that adaptations of approaches that are effective for classical classifiers remain effective in the LM-based classifier environment. However, there is room for further exploration of prompt-based remediation methods (and other remediation methods that take advantage of LM structure).

翻译：分类器在工业界被广泛用于执行各类策略，从检测有害内容到适龄内容过滤。尽管这些分类器发挥着重要作用，但确保其构建方式能够最大限度地减少对用户的不公平偏见也至关重要。其中一种公平性考量被称为群体公平性，它期望不同用户子群体能够获得平等对待。这在"经典"分类器背景下是一个已被深入研究的问题。然而，基于提示的语言模型决策的出现为基于文本的分类任务提供了新的解决途径，而这些新型分类器的公平性特性尚未得到充分理解。此外，针对基于语言模型的决策者的"修正工具包"尚不完善，如何在保持分类器性能的同时提升决策者的群体公平性仍知之甚少。本研究旨在为该工具包增添更多工具。我们将经典分类器公平性领域中现有有效方法适配到基于提示的分类器空间。同时，我们设计了一些能够利用基于提示的决策者新型结构、在提示层面操作的简单方法。我们在真实数据上对这些方法进行了实证比较。结果表明，对经典分类器有效的方法经过适配后，在基于语言模型的分类器环境中依然有效。然而，基于提示的修正方法（以及其他能够利用语言模型结构的修正方法）仍有进一步探索的空间。