From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models

Big models, exemplified by Large Language Models (LLMs), are models typically pre-trained on massive data and comprised of enormous parameters, which not only obtain significantly improved performance across diverse tasks but also present emergent capabilities absent in smaller models. However, the growing intertwining of big models with everyday human lives poses potential risks and might cause serious social harm. Therefore, many efforts have been made to align LLMs with humans to make them better follow user instructions and satisfy human preferences. Nevertheless, `what to align with' has not been fully discussed, and inappropriate alignment goals might even backfire. In this paper, we conduct a comprehensive survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal. Particularly, we investigate related works from two perspectives: the definition of alignment goals and alignment evaluation. Our analysis encompasses three distinct levels of alignment goals and reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs. Based on such results, we further discuss the challenges of achieving such intrinsic value alignment and provide a collection of available resources for future research on the alignment of big models.

翻译：大模型（以大型语言模型LLMs为例）通常指在海量数据上预训练、包含海量参数的模型，它们不仅能在各类任务中取得显著性能提升，还展现出小型模型所不具备的涌现能力。然而，随着大模型与人类日常生活日益交织，潜在风险随之而来，甚至可能引发严重的社会危害。为此，学界已开展大量工作使LLMs与人类对齐，以使其更好地遵循用户指令并满足人类偏好。但"与什么对齐"这一核心问题尚未得到充分讨论，不恰当的对齐目标甚至可能适得其反。本文对现有工作中不同对齐目标进行系统综述，追溯其演化路径以帮助识别最本质的目标。具体而言，我们从对齐目标的定义与对齐评估两个维度展开研究，涵盖三个不同层次的对齐目标，揭示出从基础能力到价值取向的目标转换趋势，表明内在人类价值观有望成为增强型LLMs的对齐目标。基于此，我们进一步探讨实现此类内在价值对齐面临的挑战，并为未来大模型对齐研究提供可用资源汇编。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日