The preference optimization literature contains many proposed objectives, often presented as distinct improvements. We introduce Opal, a canonicalization algorithm that determines whether two preference objectives are algebraically equivalent by producing either a canonical form or a concrete witness of non-equivalence. Applying Opal reveals that many widely used methods optimize the same underlying objective, while others are provably distinct. For example, batch normalization can cause the same response pair to receive different gradients depending on batch composition. We identify a small set of structural mechanisms that give rise to genuinely different objectives; most remaining differences are reparameterizations.
翻译:偏好优化文献中提出了众多目标函数,这些目标常被表述为具有显著差异的改进方案。本文引入Opal——一种规范化算法,该算法通过生成规范形式或提供不等价的具体反例,以判定两种偏好目标在代数上是否等价。应用Opal分析表明,许多广泛使用的方法实际上优化了相同的底层目标,而另一些方法则被证明存在本质差异。例如,批归一化操作可能导致同一响应对根据批次构成的不同而获得相异的梯度。我们识别出一组能产生本质不同目标函数的结构性机制;其余多数差异仅源于参数重表述。