Adaptive reasoning is essential for aligning the computational effort of large language models (LLMs) with the intrinsic difficulty of problems. Current chain-of-thought methods boost reasoning ability but indiscriminately generate long explanations, leading to evident inefficiency. However, existing reinforcement learning approaches to adaptive thinking remain unstable and heavily reward-dependent. Here we propose \textbf{DART}, a supervised \textbf{D}ifficulty-\textbf{A}daptive \textbf{R}easoning \textbf{T}runcation framework that adjusts thinking length according to problem difficulty. By distilling concise reasoning patterns from stronger models, interpolating them into a continuum of reasoning styles, and curating optimal training data that balances correctness and compactness, DART learns when to ``stop thinking''. Across multiple mathematical benchmarks, experimental results demonstrate its remarkable efficiency while preserving or improving accuracy, achieving a significant 81.2\% reasoning truncation (DeepSeek-R1-Distill-Qwen-7B on GSM8K dataset) with 5.33$\times$ computational acceleration. DART provides a stable and general paradigm for efficient reasoning, advancing the development of adaptive intelligence in LLMs.
翻译:自适应推理对于使大型语言模型(LLMs)的计算投入与问题内在难度相匹配至关重要。现有的思维链方法虽能提升推理能力,但会不加区分地生成长篇解释,导致显著的效率低下。然而,当前基于强化学习的自适应思考方法仍不稳定且严重依赖奖励机制。本文提出\\textbf{DART},一种监督式的\\textbf{难度自适应推理截断}框架,可根据问题难度调整思考长度。通过从更强模型中蒸馏出简洁的推理模式,将其插值为连续的推理风格谱系,并筛选出平衡正确性与紧凑性的最优训练数据,DART学会了何时“停止思考”。在多个数学基准测试中,实验结果表明其在保持或提升准确率的同时实现了卓越的效率,在GSM8K数据集上(DeepSeek-R1-Distill-Qwen-7B模型)达到81.2%的推理截断率,计算加速比达5.33倍。DART为高效推理提供了稳定且通用的范式,推动了LLMs自适应智能的发展。