Towards a Unified View of Preference Learning for Large Language Models: A Survey

Bofei Gao,Feifan Song,Yibo Miao,Zefan Cai,Zhe Yang,Liang Chen,Helan Hu,Runxin Xu,Qingxiu Dong,Ce Zheng,Shanghaoran Quan,Wen Xiao,Ge Zhang,Daoguang Zan,Keming Lu,Bowen Yu,Dayiheng Liu,Zeyu Cui,Jian Yang,Lei Sha,Houfeng Wang,Zhifang Sui,Peiyi Wang,Tianyu Liu,Baobao Chang

from arxiv, 23 pages, 6 figures

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.

翻译：大语言模型（LLMs）展现出极其强大的能力。其成功的关键因素之一在于将大语言模型的输出与人类偏好对齐。这一对齐过程通常仅需少量数据即可有效提升大语言模型的性能。尽管方法有效，该领域的研究横跨多个学科，且所涉及的方法相对复杂难懂。不同方法之间的关联尚未得到充分探索，这限制了对齐技术的发展。鉴于此，我们将现有主流对齐策略分解为不同组成部分，并提出一个统一框架来研究当前的对齐策略，从而建立它们之间的联系。在本综述中，我们将偏好学习中的所有策略分解为四个组成部分：模型、数据、反馈和算法。这一统一视角不仅提供了对现有对齐算法的深入理解，也为融合不同策略的优势开辟了可能性。此外，我们详细展示了当前主流算法的工作示例，以帮助读者全面理解。最后，基于我们的统一视角，我们探讨了将大语言模型与人类偏好对齐所面临的挑战及未来研究方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日