TODO：利用三元偏好增强大语言模型对齐 (TODO: Enhancing LLM Alignment with Ternary Preferences)

Aligning large language models (LLMs) with human intent is critical for enhancing their performance across a variety of tasks. Standard alignment techniques, such as Direct Preference Optimization (DPO), often rely on the binary Bradley-Terry (BT) model, which can struggle to capture the complexities of human preferences -- particularly in the presence of noisy or inconsistent labels and frequent ties. To address these limitations, we introduce the Tie-rank Oriented Bradley-Terry model (TOBT), an extension of the BT model that explicitly incorporates ties, enabling more nuanced preference representation. Building on this, we propose Tie-rank Oriented Direct Preference Optimization (TODO), a novel alignment algorithm that leverages TOBT's ternary ranking system to improve preference alignment. In evaluations on Mistral-7B and Llama 3-8B models, TODO consistently outperforms DPO in modeling preferences across both in-distribution and out-of-distribution datasets. Additional assessments using MT Bench and benchmarks such as Piqa, ARC-c, and MMLU further demonstrate TODO's superior alignment performance. Notably, TODO also shows strong results in binary preference alignment, highlighting its versatility and potential for broader integration into LLM alignment. The implementation details can be found in https://github.com/XXares/TODO.

翻译：将大语言模型（LLM）与人类意图对齐对于提升其在各类任务中的表现至关重要。标准的对齐技术，如直接偏好优化（DPO），通常依赖于二元Bradley-Terry（BT）模型，该模型在捕捉人类偏好的复杂性方面可能存在困难——尤其是在存在噪声或不一致标签以及频繁出现平局的情况下。为了解决这些局限性，我们引入了平局导向的Bradley-Terry模型（TOBT），这是BT模型的一个扩展，它显式地纳入了平局情况，从而能够实现更细致的偏好表示。在此基础上，我们提出了平局导向的直接偏好优化（TODO），这是一种新颖的对齐算法，它利用TOBT的三元排序系统来改进偏好对齐。在Mistral-7B和Llama 3-8B模型上的评估表明，TODO在分布内和分布外数据集上的偏好建模方面均持续优于DPO。使用MT Bench以及Piqa、ARC-c和MMLU等基准进行的进一步评估也证明了TODO具有更优的对齐性能。值得注意的是，TODO在二元偏好对齐方面也表现出色，突显了其多功能性以及在LLM对齐中更广泛集成的潜力。实现细节可在https://github.com/XXares/TODO找到。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日