Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood estimation, it compromises on the ability to easily tune language models to maximize auxiliary, non-preferential objectives according to the LLM designer's preferences (e.g., tuning lexical style or minimizing specific kinds of harmful content). Critically, these designer objectives may not be amply human-labeled or represented in available data, align with user preferences, or even be able to be captured tractably by binary preference pairs. To leverage the simplicity and performance of DPO with the generality of RL, we propose a unified approach. Based on a simple decomposition of preference and auxiliary objectives, we allow for tuning LLMs to optimize user and designer preferences without any additional specialized or preference data, computational cost, stability ``tweaks'', or training instability. The proposed method, Unified Preference Optimization, shows the ability to effectively generalize to user preferences and auxiliary objectives, while preserving or surpassing alignment performance on challenging benchmarks across a range of model sizes.

翻译：在对齐大型语言模型（LLMs）方面，先前的研究利用了基于人类反馈的强化学习（RLHF）或直接偏好优化（DPO）的变体。虽然DPO提供了一个基于最大似然估计的更简单框架，但它牺牲了根据LLM设计者偏好轻松调整语言模型以最大化辅助性、非偏好目标的能力（例如，调整词汇风格或最小化特定类型的有害内容）。关键的是，这些设计者目标可能没有充足的人工标注数据或未在可用数据中得到充分体现，可能与用户偏好不一致，甚至无法通过二元偏好对进行有效捕捉。为了结合DPO的简洁性与性能以及RL的通用性，我们提出了一种统一方法。基于对偏好目标和辅助目标的简单分解，我们能够调整LLMs以优化用户和设计者偏好，而无需任何额外的专用数据或偏好数据、额外的计算成本、稳定性“微调”或训练不稳定性。所提出的方法——统一偏好优化（Unified Preference Optimization）——在保持或超越一系列不同规模模型在具有挑战性的基准测试上的对齐性能的同时，展现出有效泛化到用户偏好和辅助目标的能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日