Employing Large Language Models (LLM) in various downstream applications such as classification is crucial, especially for smaller companies lacking the expertise and resources required for fine-tuning a model. Fairness in LLMs helps ensure inclusivity, equal representation based on factors such as race, gender and promotes responsible AI deployment. As the use of LLMs has become increasingly prevalent, it is essential to assess whether LLMs can generate fair outcomes when subjected to considerations of fairness. In this study, we introduce a framework outlining fairness regulations aligned with various fairness definitions, with each definition being modulated by varying degrees of abstraction. We explore the configuration for in-context learning and the procedure for selecting in-context demonstrations using RAG, while incorporating fairness rules into the process. Experiments conducted with different LLMs indicate that GPT-4 delivers superior results in terms of both accuracy and fairness compared to other models. This work is one of the early attempts to achieve fairness in prediction tasks by utilizing LLMs through in-context learning.
翻译:将大语言模型(LLM)应用于分类等下游任务至关重要,尤其是对于缺乏微调模型所需专业知识和资源的小型企业而言。大语言模型中的公平性有助于确保包容性、基于种族、性别等因素的平等代表性,并促进负责任的人工智能部署。随着大语言模型的使用日益普及,评估其在考虑公平性时能否产生公平结果变得至关重要。本研究提出一个框架,概述了与各种公平定义相符的公平性规则,每种定义通过不同程度的抽象进行调节。我们探索了上下文学习的配置,以及使用RAG选择上下文演示的过程,同时将公平性规则融入其中。基于不同大语言模型的实验表明,GPT-4在准确性和公平性方面均优于其他模型。本研究是早期尝试通过上下文学习利用大语言模型实现预测任务公平性的工作之一。