Large Language Models (LLMs) have demonstrated remarkable success across various domains. However, despite their promising performance in numerous real-world applications, most of these algorithms lack fairness considerations. Consequently, they may lead to discriminatory outcomes against certain communities, particularly marginalized populations, prompting extensive study in fair LLMs. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. To this end, this survey presents a comprehensive overview of recent advances in the existing literature concerning fair LLMs. Specifically, a brief introduction to LLMs is provided, followed by an analysis of factors contributing to bias in LLMs. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, existing research challenges and open questions are discussed.
翻译:大语言模型(LLMs)已在多个领域展现出卓越成效。然而,尽管其在众多实际应用中表现出色,大多数此类算法仍缺乏公平性考量。因此,它们可能导致针对特定群体(尤其是边缘化人群)的歧视性结果,这推动了对公平性大语言模型的广泛研究。另一方面,与传统机器学习中的公平性相比,大语言模型的公平性具有独特的背景、分类体系及实现技术。为此,本综述系统梳理了当前关于公平性大语言模型文献的最新进展。具体而言,首先简要介绍大语言模型,继而分析导致大语言模型偏见的因素。此外,分类探讨了大语言模型中的公平性概念,总结了评估大语言模型偏见的度量指标及现有促进公平性的算法。同时,汇总了包括工具包和数据集在内的大语言模型偏见评估资源。最后,讨论了当前研究面临的挑战与开放性问题。