The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset will be published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.
翻译:基于语言模型(LLM)的AI助手的成功,依赖于从人类反馈中强化学习(RLHF)以理解并符合用户意图。然而,PPO等传统对齐算法受限于复杂的标注和训练需求,这种依赖性限制了RLHF的适用性,也阻碍了面向多样化人类偏好的专业助手开发。本文提出**线性对齐**(Linear Alignment)——一种新颖的算法,通过单步推理即可实现语言模型与人类偏好的对齐,完全消除对数据标注和模型训练的依赖。线性对齐引入了一种新的 divergence 约束下策略优化参数化方法,支持以闭式形式提取最优策略,并直接估计对齐响应。在通用和个性化偏好数据集上的大量实验表明,线性对齐在不同场景下显著提升了LLM对齐的性能与效率。我们的代码和数据集将发布于 \url{https://github.com/Wizardcoast/Linear_Alignment.git}。