Chain-of-thought (CoT) reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental question: when is explicit reasoning actually beneficial? Empirical evidence reveals a striking paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. In this work, we show that LLM reasoning is not a static property of tasks or models, but a \emph{dynamic decoding state} that emerges during generation. Through systematic analysis, we find early-stage entropy dynamics provide a reliable signal of this state: tasks benefiting from CoT exhibit consistent entropy reduction, while others display unstable or increasing patterns. This behavior can be interpreted as a phase-transition-like shift from a high-entropy exploratory regime to a low-entropy structured reasoning regime. Based on these insights, we propose \textbf{EDRM} (Entropy Dynamics-based Reasoning Manifold), a lightweight and training-free routing framework that leverages early decoding entropy to adaptively select inference strategies. EDRM embeds entropy trajectories into a compact and interpretable manifold representation, enabling both zero-shot deployment and fine-grained instance-level adaptation. Across 15 benchmarks and 4 LLMs of varying scales and architectures, EDRM consistently outperforms static baselines. At the dataset level, EDRM achieves \textbf{41--55\%} token reduction while improving accuracy with as few as 50 calibration samples. At the instance level, it further improves accuracy by up to \textbf{4.7\%} while maintaining \textbf{27--45\%} token savings. These results suggest that reasoning should be invoked selectively rather than by default, and demonstrate the effectiveness of entropy-driven decoding control for efficient and adaptive LLM inference.
翻译:思维链推理已成为增强大语言模型能力的默认策略,但其应用引出了一个根本性问题:何时显式推理才真正有益?经验证据揭示了一个显著悖论:思维链在事实性和开放式任务中往往带来边际收益甚至负收益,同时成倍增加token消耗。本研究表明,大语言模型的推理并非任务或模型的静态属性,而是生成过程中涌现的*动态解码状态*。通过系统分析,我们发现早期熵动力学为这种状态提供了可靠信号:从思维链受益的任务表现出持续性熵减,而其他任务则呈现不稳定或熵增模式。这种特征可解释为从高熵探索态向低熵结构化推理态的类相变跃迁。基于这些见解,我们提出**EDRM**(基于熵动力学的推理流形),一种轻量级无训练路由框架,利用早期解码熵自适应选择推理策略。EDRM将熵轨迹嵌入紧凑且可解释的流形表征,同时支持零样本部署和细粒度实例级自适应。在15个基准测试和4种不同规模与架构的大语言模型上,EDRM始终优于静态基线。在数据集层面,EDRM仅需50个校准样本即可实现**41-55%**的token缩减并提升准确率。在实例层面,其在保持**27-45%** token节省的同时,进一步将准确率提升高达**4.7%**。这些结果表明推理策略应选择性调用而非默认使用,并验证了熵驱动解码控制对于高效自适应大语言模型推断的有效性。