Recently, small models with latent recursion have obtained promising results on complex reasoning tasks. These results are typically explained by the theory that such recursion increases a networks depth, allowing it to compactly emulate the capacity of larger models. However, the performance of recursively added layers remains behind the capabilities of one pass models with the same feed forward depth. This means that in the looped version, not every recursive step effectively contributes to depth. This raises the question: when and why does latent reasoning improve performance, and when does it result in dead compute? In our work, we analyze the algorithms that latent reasoning provides answer to this question. We show that latent reasoning can be formalized as a classifier free guidance and policy improvement algorithm. Building on these insights, we propose to use a training schemes from reinforcement learning and diffusion methods for latent reasoning models. Using the Tiny Recursive Model as our testbed, we show that with our modifications we can avoid dead compute steps and reduce the total number of forward passes by 18x while maintaining performance. Broadly speaking, we show how a policy improvement perspective on recursive steps can explain model behavior and provide insights for further improvements.
翻译:近期,具有潜在递归结构的小型模型在复杂推理任务中取得了令人瞩目的成果。这类结果通常被解释为:递归机制增加了网络深度,使其能够紧凑地模拟更大模型的容量。然而,递归叠加层的性能仍落后于具有相同前馈深度的单次前向模型。这意味着在循环版本中,并非每个递归步骤都能有效贡献于深度。这引出一个核心问题:潜在推理何时以及为何能提升性能?何时又会导致计算资源无效消耗?本研究通过分析潜在推理所实现的算法来回答这一问题。我们证明潜在推理可形式化为无分类器引导与策略提升算法。基于这些发现,我们提出将强化学习与扩散方法的训练机制应用于潜在推理模型。以微型递归模型为实验平台,我们证明通过改进方案可避免无效计算步骤,在保持性能的同时将总前向传播次数降低18倍。总体而言,本研究通过策略提升视角阐释递归步骤对模型行为的影响机制,并为后续优化提供理论洞见。