Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely driven by a small set of trigger tokens rather than the instructions themselves. Through attention analysis and controlled prompting experiments, we show that a leading ``Okay'' token induces reasoning behavior, while the newline pattern following ``</think>'' suppresses it. Based on this observation, we propose Mid-Think, a simple training-free prompting format that combines these triggers to achieve intermediate-budget reasoning, consistently outperforming fixed-token and prompt-based baselines in terms of the accuracy-length trade-off. Furthermore, applying Mid-Think to RL training after SFT reduces training time by approximately 15% while improving final performance of Qwen3-8B on AIME from 69.8% to 72.4% and on GPQA from 58.5% to 61.1%, demonstrating its effectiveness for both inference-time control and RL-based reasoning training.
翻译:混合推理语言模型通常通过高级的Think/No-think指令来控制推理行为,但我们发现这种模式切换主要由一小部分触发令牌驱动,而非指令本身。通过注意力分析和受控提示实验,我们证明了一个领先的“Okay”令牌会诱导推理行为,而“</think>”后的换行模式则会抑制推理。基于这一观察,我们提出了Mid-Think,这是一种简单的无需训练提示格式,它结合了这些触发器以实现中等预算推理,在准确率-长度权衡方面持续优于固定令牌和基于提示的基线方法。此外,在监督微调后将Mid-Think应用于强化学习训练,可将训练时间减少约15%,同时将Qwen3-8B在AIME上的最终性能从69.8%提升至72.4%,在GPQA上从58.5%提升至61.1%,这证明了其在推理时控制和基于强化学习的推理训练中的有效性。