Entropy-Gated Latent Recursion

Inference-time scaling has become the dominant lever for improving language-model reasoning, but existing methods derive rollout diversity from a single source: stochastic token-level sampling. We argue that this single-axis sampling space is fundamentally limiting, and identify a second, fully deterministic and complementary axis: the layer span $L$ at which a frozen model's top decoder layers are recursively re-applied at high-uncertainty tokens. Different choices of $L$ produce distinct rollouts that solve different subsets of problems, with no stochasticity. We instantiate this axis through Entropy-Gated Latent Recursion (EGLR), a training-free decoding procedure that re-applies the top-$L$ layers for at most $K_{\max}$ iterations until the next-token distribution converges. Combined with $T$ temperature samples, EGLR turns a single-axis stochastic rollout pool into an $L\times T$ Cartesian sampling space at almost the same per-rollout cost. We characterize this space across $8$ instruction-tuned models and $6$ math reasoning benchmarks, and show that the $L$-axis is genuinely complementary to temperature: on MATH-500 with Qwen2.5-3B-Instruct, the joint $L\times T$ oracle reaches $91.6\%$, $+8.2$ percentage points beyond the temperature-only oracle ($83.4\%$) and $+10.4$ points beyond the layer-only oracle ($81.2\%$), confirming that the two axes capture genuinely complementary problems. The expanded rollout pool provides richer per-prompt candidates for any downstream procedure that consumes rollouts, including self-consistency, best-of-$N$ with verifiers, and group-relative RL training (GRPO), opening a new direction for inference-time scaling that does not rely on stochastic noise.

翻译：推理时间缩放已成为提升语言模型推理能力的主要手段，但现有方法从单一来源获取轨迹多样性：随机性词元级采样。我们认为这种单轴采样空间存在根本性局限，并识别出第二个完全确定且互补的轴：对于冻结模型，在不确定性较高的词元处递归复用其顶层解码器所对应的层跨度$L$。不同$L$的选择无需任何随机性即可生成求解不同子问题的独特轨迹。我们通过熵门控潜递归（EGLR）实例化该轴——这是一种免训练解码流程，在至多$K_{\max}$次迭代中重复应用顶层$L$层，直至下一词元分布收敛。与$T$个温度采样相结合后，EGLR将单轴随机轨迹池转化为$L\times T$笛卡尔采样空间，且每条轨迹的边际成本几乎不变。我们在8个指令微调模型和6个数学推理基准上刻画该空间，表明$L$轴与温度确实互补：在MATH-500上使用Qwen2.5-3B-Instruct时，联合$L\times T$理想阈值达到$91.6\%$，比纯温度理想阈值（$83.4\%$）高出$8.2$个百分点，比纯层理想阈值（$81.2\%$）高出$10.4$个百分点，证实两个轴确实捕获了本质上互补的问题。扩展后的轨迹池能为任何使用轨迹的下游流程（包括自一致性、带验证器的最优$N$选择、群体相对RL训练GRPO）提供更丰富的每提示候选集，由此开辟了不依赖随机噪声的推理时间缩放新方向。