Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached, attention-head behavior stabilizes, leading to constant behavior across recurrences. Empirically, we discover that recurrent blocks learn stages of inference that closely mirror those of feedforward models, repeating these stages in depth with each iteration. We study how recurrent block size, input injection, and normalization influence the emergence and stability of these cyclic fixed points. We believe these findings help translate mechanistic insights into practical guidance for architectural design.
翻译:推理已成为大型语言模型的核心能力。最新研究表明,通过在潜在维度上循环使用LLM的层可以提升推理性能,由此产生了循环推理语言模型。尽管已取得初步成果,但少有研究探讨其内部动态机制与标准前馈模型的差异。本文对循环语言模型的潜在状态进行机理分析,重点比较前馈模型与循环模型中推理阶段的表现差异。为此,我们剖析循环递归现象,发现在多数受测模型中,循环中的每一层均收敛至不同的不动点;因此,递归模块在潜在空间中遵循一致的循环轨迹。我们通过证据表明:当这些不动点达成时,注意力头的行为趋于稳定,从而在递归过程中形成恒定行为模式。实验发现,递归模块学得的推理阶段与前馈模型高度相似,每次迭代都会在深度层面重复这些阶段。我们进一步研究了递归块大小、输入注入和归一化对上述循环不动点涌现与稳定性的影响。相信这些发现有助于将机理洞见转化为架构设计的实用指南。