Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. However, it is generally costly and unstable to fine-tune large foundation models using reinforcement learning (RL), and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignment process. In this paper, we introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context and applies supervised fine-tuning for alignment. The salient features of RiC are simplicity and adaptivity, as it only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time. Inspired by the analytical solution of an abstracted convex optimization problem, our dynamic inference-time adjustment method approaches the Pareto-optimal solution for multiple objectives. Empirical evidence demonstrates the efficacy of our method in aligning both Large Language Models (LLMs) and diffusion models to accommodate diverse rewards with only around $10\%$ GPU hours compared with multi-objective RL baseline.

翻译：我们考虑了基础模型与人类偏好的多目标对齐问题，这是构建有益且无害的人工智能系统的关键步骤。然而，使用强化学习（RL）微调大型基础模型通常成本高昂且不稳定，而人类偏好的多维性、异质性和冲突性进一步复杂化了对齐过程。本文提出了“上下文中的奖励”（RiC）方法，该方法将基础模型的响应条件设置为提示上下文中的多个奖励，并应用监督微调进行对齐。RiC的显著特点是简单性和自适应性，因为它只需对单个基础模型进行监督微调，并在推理期间支持用户偏好的动态调整。受抽象凸优化问题解析解的启发，我们的动态推理时调整方法能够趋近于多目标的帕累托最优解。实验证据表明，我们的方法在对齐大型语言模型（LLMs）和扩散模型以适应多样化奖励方面具有有效性，相比多目标强化学习基线，仅需约10%的GPU时长。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日