When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Chain-of-thought (CoT) reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental question: when is explicit reasoning actually beneficial? Empirical evidence reveals a striking paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. In this work, we show that LLM reasoning is not a static property of tasks or models, but a \emph{dynamic decoding state} that emerges during generation. Through systematic analysis, we find early-stage entropy dynamics provide a reliable signal of this state: tasks benefiting from CoT exhibit consistent entropy reduction, while others display unstable or increasing patterns. This behavior can be interpreted as a phase-transition-like shift from a high-entropy exploratory regime to a low-entropy structured reasoning regime. Based on these insights, we propose \textbf{EDRM} (Entropy Dynamics-based Reasoning Manifold), a lightweight and training-free routing framework that leverages early decoding entropy to adaptively select inference strategies. EDRM embeds entropy trajectories into a compact and interpretable manifold representation, enabling both zero-shot deployment and fine-grained instance-level adaptation. Across 15 benchmarks and 4 LLMs of varying scales and architectures, EDRM consistently outperforms static baselines. At the dataset level, EDRM achieves \textbf{41--55\%} token reduction while improving accuracy with as few as 50 calibration samples. At the instance level, it further improves accuracy by up to \textbf{4.7\%} while maintaining \textbf{27--45\%} token savings. These results suggest that reasoning should be invoked selectively rather than by default, and demonstrate the effectiveness of entropy-driven decoding control for efficient and adaptive LLM inference.

翻译：思维链推理已成为增强大语言模型能力的默认策略，但其应用引出了一个根本性问题：何时显式推理才真正有益？经验证据揭示了一个显著悖论：思维链在事实性和开放式任务中往往带来边际收益甚至负收益，同时成倍增加token消耗。本研究表明，大语言模型的推理并非任务或模型的静态属性，而是生成过程中涌现的*动态解码状态*。通过系统分析，我们发现早期熵动力学为这种状态提供了可靠信号：从思维链受益的任务表现出持续性熵减，而其他任务则呈现不稳定或熵增模式。这种特征可解释为从高熵探索态向低熵结构化推理态的类相变跃迁。基于这些见解，我们提出**EDRM**（基于熵动力学的推理流形），一种轻量级无训练路由框架，利用早期解码熵自适应选择推理策略。EDRM将熵轨迹嵌入紧凑且可解释的流形表征，同时支持零样本部署和细粒度实例级自适应。在15个基准测试和4种不同规模与架构的大语言模型上，EDRM始终优于静态基线。在数据集层面，EDRM仅需50个校准样本即可实现**41-55%**的token缩减并提升准确率。在实例层面，其在保持**27-45%** token节省的同时，进一步将准确率提升高达**4.7%**。这些结果表明推理策略应选择性调用而非默认使用，并验证了熵驱动解码控制对于高效自适应大语言模型推断的有效性。