Large reasoning models (LRMs) achieve strong performance on mathematical reasoning tasks, often attributed to their capability to generate explicit chain-of-thought (CoT) explanations. However, recent work shows that LRMs often arrive at the correct answer before completing these textual reasoning steps, indicating the presence of latent reasoning -- internal, non-verbal computation encoded in hidden states. While this phenomenon has been explored in English, its multilingual behavior remains largely unknown. In this paper, we conduct a systematic investigation of multilingual latent reasoning in LRMs across 11 languages. Using a truncation-based strategy, we examine how the correct answer emerges as the model is given only partial reasoning traces, allowing us to measure stepwise latent prediction formation. Our results reveal clear evidence of multilingual latent reasoning, though unevenly: strong in resource-rich languages, weaker in low-resource ones, and broadly less observable on harder benchmarks. To understand whether these differences reflect distinct internal mechanisms, we further perform representational analyses. Despite surface-level disparities, we find that the internal evolution of predictions is highly consistent across languages and broadly aligns with English -- a pattern suggesting an English-centered latent reasoning pathway.
翻译:大型推理模型(LRMs)在数学推理任务上表现出色,这通常归因于其生成显式思维链(CoT)解释的能力。然而,近期研究表明,LRMs 往往在完成这些文本推理步骤之前就已得出正确答案,这表明存在潜在推理——即编码于隐藏状态中的内部、非语言计算。尽管这一现象已在英语中得到探讨,但其在多语言环境中的行为仍鲜为人知。本文中,我们对 LRMs 在 11 种语言中的多语言潜在推理进行了系统性研究。通过采用基于截断的策略,我们考察了当模型仅获得部分推理轨迹时正确答案如何浮现,从而能够度量逐步潜在预测的形成。我们的结果揭示了多语言潜在推理的明确证据,但其表现并不均衡:在资源丰富的语言中较强,在低资源语言中较弱,且在较难的基准测试中普遍较难观测。为了理解这些差异是否反映了不同的内部机制,我们进一步进行了表征分析。尽管存在表层差异,我们发现预测的内部演化在不同语言间高度一致,且与英语基本吻合——这一模式表明存在一条以英语为中心的潜在推理路径。