Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models

Large Reasoning Models (LRMs) achieve strong performance through explicit chain-of-thought reasoning but suffer from \textit{overthinking}: generating excessive reasoning tokens even for trivial queries. {Beyond inflating cost, overthinking can be self-defeating: models enter recursive self-doubt loops that exhaust token budgets without producing an answer, causing API timeouts that directly hurt accuracy.} We present an empirical study showing that \textbf{batch prompting}, originally introduced for throughput optimization, effectively suppresses overthinking at inference time. Across 13 diverse benchmarks with DeepSeek-R1 and OpenAI-o1, batch prompting {reduces reasoning tokens by 76\% (2{,}950$\mapsto$710), on average, while preserving or improving accuracy}. Through behavioral analysis, we find that batching induces three beneficial effects: (1) it reduces per-query reasoning effort when multiple queries share a context; (2) it enables pattern induction, where models generalize from earlier examples to solve later ones; and (3) it suppresses hedging behavior (e.g., ``\texttt{wait,}'' ``\texttt{let me double-check}'') that signals metacognitive loops. We also show that explicit prompt constraints (``\texttt{Use no more than 100 tokens in thinking.}'') fail to reduce overthinking; models either ignore them or sacrifice accuracy. These findings reframe batch prompting as more than a cost optimization: it is a practical inference-time technique that improves efficiency and reliability without model modification.

翻译：大型推理模型通过显式的思维链推理获得强劲性能，但存在“过度思考”问题：即使面对简单查询也会生成过量的推理标记。{过度思考不仅增加成本，还可能适得其反：模型会陷入递归的自我怀疑循环，耗尽标记预算却未生成答案，导致API超时从而直接损害准确性。}我们通过实证研究表明，最初为吞吐量优化提出的**批量提示**方法，在推理时能有效抑制过度思考。在涵盖DeepSeek-R1和OpenAI-o1的13个多样化基准测试中，批量提示{平均减少76%的推理标记（2,950$\mapsto$710），同时保持或提升准确性}。通过行为分析，我们发现批处理产生三种有益效应：(1) 当多个查询共享上下文时，降低单次查询的推理负荷；(2) 实现模式归纳，模型能从先前的示例泛化以解决后续问题；(3) 抑制表征元认知循环的犹豫行为（如“等等”、“让我再检查一下”）。我们还证明，显式提示约束（“思考过程不超过100个标记”）无法减少过度思考；模型要么忽略这些约束，要么以牺牲准确性为代价。这些发现重新定位了批量提示的价值：它不仅是成本优化手段，更是一种无需修改模型的实用推理时技术，能同时提升效率与可靠性。