From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models

Big models, exemplified by Large Language Models (LLMs), are models typically pre-trained on massive data and comprised of enormous parameters, which not only obtain significantly improved performance across diverse tasks but also present emergent capabilities absent in smaller models. However, the growing intertwining of big models with everyday human lives poses potential risks and might cause serious social harm. Therefore, many efforts have been made to align LLMs with humans to make them better follow user instructions and satisfy human preferences. Nevertheless, `what to align with' has not been fully discussed, and inappropriate alignment goals might even backfire. In this paper, we conduct a comprehensive survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal. Particularly, we investigate related works from two perspectives: the definition of alignment goals and alignment evaluation. Our analysis encompasses three distinct levels of alignment goals and reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs. Based on such results, we further discuss the challenges of achieving such intrinsic value alignment and provide a collection of available resources for future research on the alignment of big models.

翻译：大模型（以大型语言模型LLMs为例）通常是在海量数据上预训练并包含海量参数的模型，这不仅使其在各种任务中性能显著提升，还涌现出小型模型所不具备的能力。然而，大模型与人类日常生活的日益交织带来了潜在风险，可能造成严重的社会危害。因此，学界已投入大量工作使LLMs与人类对齐，以使其更好地遵循用户指令并满足人类偏好。但"与什么对齐"这一问题尚未得到充分探讨，不当的对齐目标甚至可能适得其反。本文对现有工作中不同的对齐目标进行了全面综述，追溯其演化路径以帮助识别最本质的目标。具体而言，我们从两个维度考察相关研究：对齐目标的定义与对齐评估。我们的分析涵盖三个层次的对齐目标，揭示出从基础能力到价值导向的目标转化过程，表明内在人类价值作为增强型LLMs对齐目标的潜力。基于这些发现，我们进一步探讨了实现此类内在价值对齐所面临的挑战，并提供了一系列可用资源，以期助力大模型对齐领域的未来研究。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日