Large language models (LLMs) have shown powerful performance and development prospect and are widely deployed in the real world. However, LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks. Unfair LLM systems have undesirable social impacts and potential harms. In this paper, we provide a comprehensive review of related research on fairness in LLMs. First, for medium-scale LLMs, we introduce evaluation metrics and debiasing methods from the perspectives of intrinsic bias and extrinsic bias, respectively. Then, for large-scale LLMs, we introduce recent fairness research, including fairness evaluation, reasons for bias, and debiasing methods. Finally, we discuss and provide insight on the challenges and future directions for the development of fairness in LLMs.
翻译:大语言模型(LLMs)展现出强大的性能与发展前景,并已广泛部署于实际应用场景中。然而,LLMs可能从未经处理的训练数据中习得社会偏见,并将这些偏见传播至下游任务。不公平的LLM系统会带来不良社会影响和潜在危害。本文对LLM公平性相关研究进行了全面综述。首先,针对中等规模LLMs,我们从内在偏差和外在偏差两个维度分别介绍评估指标与去偏方法。其次,针对大规模LLMs,我们介绍近期公平性研究进展,包括公平性评估、偏见成因及去偏方法。最后,我们讨论并展望了LLM公平性发展面临的挑战与未来方向。