Latent reasoning models (LRMs) replace explicit chain-of-thought with continuous thoughts. Recent work treats observable latent-state patterns, such as BFS-like frontiers and decodable arithmetic computation, as evidence for internal reasoning mechanisms. Evaluating two LRMs (Coconut and CODI) against controls lacking the proposed recurrence or curriculum, we find these patterns also appear in the controls and do not always causally affect behavior. Causal interventions reveal that latent-thought utilization is not binary but graded, scaling with a thought's causal effect on model behavior. Geometric analyses reveal this effect concentrates in low-rank directions whose step-to-step geometry grows more structured as their behavioral influence increases. Latent thoughts should therefore be treated as hidden computation, not hidden explanation: decodability, attention, or static structure alone cannot establish mechanism. LRM interpretability thus requires matched controls and causal tests.
翻译:潜在推理模型(LRM)用连续思维取代了显式的思维链。近期研究将可观察的潜在状态模式(如广度优先搜索式的前沿边界和可解码的算术计算)视为内部推理机制的证据。通过评估两个LRM模型(Coconut和CODI)与缺乏所提出的循环或课程训练的对照模型,我们发现这些模式同样出现在对照模型中,且并非总能因果性地影响行为。因果干预实验表明,潜在思维的利用并非二元性的,而是分级的,其程度与思维对模型行为的因果效应成正比。几何分析揭示这种效应集中在低秩方向上,随着行为影响的增强,这些方向的逐步骤几何结构变得更加有序。因此,潜在思维应被视为隐藏计算,而非隐藏解释:仅凭可解码性、注意力机制或静态结构无法确立机制。LRM可解释性研究因而需要匹配的对照模型与因果检验。