Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we provide a unified view of these two model classes. Taking the chain rule of probability as a starting point, we establish an explicit bijection between ARMs and EBMs in function space, which we show to correspond to a special case of the soft Bellman equation in maximum entropy reinforcement learning. Building upon this bijection, we derive the equivalence between supervised learning of ARMs and EBMs. Furthermore, we analyze the distillation of EBMs into ARMs by providing theoretical error bounds. Our results provide insights into the ability of ARMs to plan ahead, despite being based on the next-token prediction paradigm.
翻译:自回归模型(ARMs)当前构成了大语言模型(LLMs)的主导范式。基于能量的模型(EBMs)代表了另一类模型,在LLM发展史上虽不普遍,却能自然地刻画训练后对齐中的最优策略。本文为这两类模型提供了一个统一的视角。以概率链式法则为起点,我们在函数空间中建立了ARMs与EBMs之间的显式双射,并证明其对应于最大熵强化学习中软贝尔曼方程的一个特例。基于此双射,我们推导出ARMs与EBMs监督学习的等价性。此外,我们通过提供理论误差界分析了将EBMs蒸馏为ARMs的过程。我们的结果为ARMs基于下一词预测范式却具备前瞻规划能力提供了理论洞见。