Recent work explores latent reasoning to improve reasoning efficiency by replacing explicit reasoning trajectories with continuous representations in a latent space, yet its effectiveness varies across settings. Analysis of model confidence dynamics under latent reasoning reveals that thinking trajectories ending in incorrect answers contain fewer low-confidence steps than those ending in correct answers. Meanwhile, we suggest that soft embeddings aggregated by multiple low-confidence thinking alternatives may introduce and propagate noise, leading to high confidence in unreliable reasoning trajectories. Motivated by these observations, ThinkRouter, an inference-time confidence-aware routing mechanism is proposed to avoid high confidence and noise for efficient reasoning. ThinkRouter routes thinking to the discrete token space when model confidence is low, and to the latent space otherwise. Extensive experiments on STEM reasoning and coding benchmarks across diverse large reasoning models demonstrate that ThinkRouter outperforms explicit CoT, random routing, and latent reasoning baselines in terms of accuracy, achieving an average improvement of 19.70 points in Pass@1, while reducing generation length by up to 15.55%. Further comprehensive analysis reveals that ThinkRouter can calibrate errors arising from explicit CoT and latent reasoning, and accelerates end-of-thinking token generation by globally lowering model confidence.
翻译:近期研究探索了潜在推理方法,通过将显式推理轨迹替换为潜在空间中的连续表示来提升推理效率,但其效果在不同场景下存在差异。对潜在推理下模型置信度动态的分析表明,以错误答案结尾的思考轨迹比以正确答案结尾的轨迹包含更少的低置信度步骤。同时,我们认为由多个低置信度思考方案聚合而成的软嵌入可能引入并传播噪声,导致对不可靠推理轨迹产生高置信度。基于这些观察,本文提出ThinkRouter——一种推理时置信度感知的路由机制,通过避免高置信度与噪声干扰来实现高效推理。ThinkRouter在模型置信度较低时将思考路由至离散词元空间,反之则路由至潜在空间。在涵盖多种大型推理模型的STEM推理与代码生成基准测试上的大量实验表明,ThinkRouter在准确率方面优于显式思维链、随机路由及潜在推理基线,Pass@1指标平均提升19.70分,同时生成长度最高减少15.55%。进一步的综合分析揭示,ThinkRouter能够校准显式思维链与潜在推理产生的误差,并通过全局降低模型置信度加速思考结束符的生成。