Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the relative probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a reduction of the model's likelihood of the preferred examples, as long as the relative probability between the preferred and dispreferred classes increases. We then show empirically that this phenomenon occurs when fine-tuning LLMs on common datasets, especially datasets in which the edit distance between pairs of completions is low. Using these insights, we design DPO-Positive (DPOP), a new loss function and training procedure which avoids this failure mode. Surprisingly, we find that DPOP outperforms DPO and other fine-tuning procedures across a wide variety of datasets and downstream tasks, including datasets with high edit distances between completions. Furthermore, we find that the DPOP-tuned model outperforms the DPO-tuned model (all else equal) on benchmarks independent of the fine-tuning data, such as MT-Bench. Finally, using DPOP, we create and open-source Smaug-34B and Smaug-72B, with the latter becoming the first open-source LLM to surpass an average accuracy of 80% on the HuggingFace Open LLM Leaderboard.

翻译：直接偏好优化（DPO）能显著提升大语言模型在推理、摘要和对齐等下游任务上的性能。该方法通过使用偏好与非偏好的数据对，建模选择一种响应相对于另一种响应的相对概率。本文首先从理论上证明，只要偏好类与非偏好类之间的相对概率增加，标准DPO损失可能导致模型对偏好示例的似然降低。随后通过实验验证，在对大语言模型进行常见数据集微调时（尤其是数据集中成对补全文本的编辑距离较小的情况）确实会出现此现象。基于这些发现，我们设计了DPO-Positive（DPOP）——一种能规避该失效模式的新损失函数与训练流程。令人惊讶的是，DPOP在包括补全文本间编辑距离较大的数据集在内的多种数据集和下游任务中，均优于DPO及其他微调方法。此外，在独立于微调数据的基准测试（如MT-Bench）中，DPOP调优模型在同等条件下也优于DPO调优模型。最后，基于DPOP方法，我们创建并开源了Smaug-34B与Smaug-72B模型，其中后者成为首个在HuggingFace开放大语言模型排行榜上平均准确率突破80%的开源大语言模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日