Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we provide a unified view of these two model classes. Taking the chain rule of probability as a starting point, we establish an explicit bijection between ARMs and EBMs in function space, which we show to correspond to a special case of the soft Bellman equation in maximum entropy reinforcement learning. Building upon this bijection, we derive the equivalence between supervised learning of ARMs and EBMs. Furthermore, we analyze the distillation of EBMs into ARMs by providing theoretical error bounds. Our results provide insights into the ability of ARMs to plan ahead, despite being based on the next-token prediction paradigm.
翻译:自回归模型(ARMs)目前构成了大型语言模型(LLMs)的主导范式。基于能量的模型(EBMs)代表了另一类模型,这类模型在LLM开发历史上较少见,却在后训练对齐中自然地刻画了最优策略。本文提供了这两类模型的统一视角。以概率链式法则为出发点,我们在函数空间建立了ARMs与EBMs之间的显式双射,并证明该对应等价于最大熵强化学习中的软贝尔曼方程特例。基于此双射,我们推导出ARMs与EBMs监督学习之间的等价性。进一步,我们通过给出理论误差界,分析了将EBMs蒸馏为ARMs的过程。我们的研究结果揭示了ARMs尽管基于下一词预测范式,却具备前瞻规划能力的内在机理。