Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that for the equicontinuous hypothesis sets typical of neural networks, these standard surrogates are theoretically inconsistent, yielding vacuous generalization guarantees. To resolve this, we formulate LLM alignment within a margin-shifted ranking framework. We derive rigorous $H$-consistency bounds that depend on enforcing a separation margin $γ$. Crucially, we extend this to Structure-Aware $H$-consistency, introducing a novel objective (SA-DPO) that adapts the margin based on the semantic distance between responses to handle synonyms and hard pairs. Finally, we analyze the trade-off between consistency and model limitations via the Margin-Capacity Profile, proving that heavy-tailed surrogates (such as the Polynomial Hinge family) offer superior consistency guarantees for capacity-bounded models compared to the standard logistic loss used in DPO.
翻译:偏好学习已成为将大型语言模型(LLMs)与人类意图对齐的基础。流行方法,如直接偏好优化(DPO),通过最小化替代损失来代理难以处理的成对排序损失。然而,我们证明,对于神经网络典型的等连续假设集,这些标准替代在理论上是不一致的,会产生空洞的泛化保证。为解决这一问题,我们在边缘移动排序框架内重新制定了LLM对齐。我们推导出严格的$H$-一致性边界,该边界依赖于强制分离边际$γ$。关键在于,我们将此扩展至结构感知$H$-一致性,引入了一个新颖的目标(SA-DPO),该目标基于回答之间的语义距离动态调整边际,以处理同义词和困难配对。最后,我们通过边际容量谱分析了一致性与模型局限性之间的权衡,证明相对于DPO中使用的标准逻辑损失,重尾替代(如多项式铰链族)对于容量受限模型能提供更优的一致性保证。