The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset is published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.
翻译:基于语言模型(LLMs)的AI助手的成功,依赖于通过人类反馈强化学习(RLHF)来理解并契合用户意图。然而,传统对齐算法(如PPO)受限于复杂的标注和训练需求,这种依赖限制了RLHF的适用性,并阻碍了针对多样化人类偏好定制专业助手的发展。在本工作中,我们提出《线性对齐》——一种在单次推理步骤中即可将语言模型与人类偏好对齐的新颖算法,彻底消除了数据标注和模型训练的依赖。线性对齐引入了一种在散度约束下进行策略优化的新参数化方法,能够以闭式解形式推导最优策略,并直接估计对齐响应。在通用与个性化偏好数据集上的大量实验表明,线性对齐在多种场景中显著提升了LLM对齐的性能与效率。我们的代码与数据集已发布于\url{https://github.com/Wizardcoast/Linear_Alignment.git}。