The Hitchhiker's Guide to Human Alignment with *PO

With the growing utilization of large language models (LLMs) across domains, alignment towards human preferences has become one of the most critical aspects of training models. At the forefront of state-of-the-art human alignment methods are preference optimization methods (*PO). However, prior research has often concentrated on identifying the best-performing method, typically involving a grid search over hyperparameters, which can be impractical for general practitioners. In this paper, we aim to identify the algorithm that, while being performant, is simultaneously more robust to varying hyperparameters, thereby increasing the likelihood of achieving better results. We focus on a realistic out-of-distribution (OOD) scenario that mirrors real-world applications of human alignment, offering practical insights into the strengths and weaknesses of these methods. Furthermore, to better understand the shortcomings of generations from the different methods, we analyze the model generations through the lens of KL divergence of the SFT model and the response length statistics. Our analysis reveals that the widely adopted DPO method consistently produces lengthy responses of inferior quality that are very close to the SFT responses. Motivated by these findings, we propose an embarrassingly simple extension to the DPO algorithm, LN-DPO, resulting in more concise responses without sacrificing quality compared to the policy obtained by vanilla DPO.

翻译：随着大语言模型在各领域的广泛应用，使其与人类偏好对齐已成为模型训练中最关键的环节之一。当前最先进的人类对齐方法以偏好优化方法（*PO）为代表。然而，先前研究往往聚焦于寻找性能最优的方法，通常涉及对超参数进行网格搜索，这对普通实践者而言可能并不实用。本文旨在识别一种在保持高性能的同时，对不同超参数设置具有更强鲁棒性的算法，从而提高获得更好结果的可能性。我们聚焦于模拟人类对齐实际应用的现实分布外场景，为这些方法的优势与局限提供实践性见解。此外，为深入理解不同方法生成结果的缺陷，我们通过分析SFT模型的KL散度与响应长度统计量来审视模型生成内容。分析表明，广泛采用的DPO方法持续产生冗长且质量较低的响应，这些响应与SFT模型的输出极为接近。基于这些发现，我们提出对DPO算法的一种极简扩展——LN-DPO，该扩展能在保持与原始DPO所得策略相当质量的前提下，生成更简洁的响应。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/