Tool-Integrated Reasoning (TIR) has significantly enhanced the capabilities of Large Language Models (LLMs), yet current agents tend to exhibit cognitive offloading, redundantly invoking external tools even for simple tasks. In this paper, we suggest that true agentic intelligence requires not just tool invocation, but the adaptive wisdom to discern when to use them. We propose AdaTIR, a framework that shifts the paradigm from static tool invocation to difficulty-aware reasoning internalization. By introducing a difficulty-aware efficiency reward, AdaTIR dynamically adjusts tool budgets based on task complexity--internalizing reasoning for simple tasks while selectively invoking tools for complex tasks. Furthermore, we identify a sign reversal problem where tool penalties outweigh correctness rewards, mistakenly penalizing correct rollouts with negative advantages. To resolve this, we propose Clipped Advantage Shaping (CAS), which ensures that correctness remains the primary objective while using efficiency as a secondary constraint. Empirical results demonstrate that AdaTIR reduces tool calls by up to 97.6% on simple tasks and 28.2% on complex challenges while maintaining or enhancing accuracy. Notably, AdaTIR successfully internalizes reasoning, outperforming baselines by 4.8% on AIME 2024 even when tool access is strictly disabled.
翻译:工具集成推理(TIR)显著增强了大型语言模型(LLM)的能力,然而当前智能体往往表现出认知卸载倾向,即使在处理简单任务时也会冗余地调用外部工具。本文认为,真正的智能体智能不仅需要工具调用能力,更需要判断何时使用工具的自适应智慧。我们提出AdaTIR框架,将范式从静态工具调用转向难度感知的推理内化。通过引入难度感知效率奖励机制,AdaTIR根据任务复杂度动态调整工具使用预算——对简单任务内化推理过程,对复杂任务则有选择地调用工具。此外,我们发现当工具惩罚超过正确性奖励时会出现符号反转问题,导致正确决策轨迹被错误地赋予负优势值。为解决此问题,我们提出截断优势塑形(CAS)方法,确保正确性作为主要优化目标的同时,将效率作为次要约束条件。实验结果表明,AdaTIR在简单任务上可减少高达97.6%的工具调用,在复杂任务上减少28.2%的工具调用,同时保持或提升任务准确率。值得注意的是,AdaTIR成功实现了推理过程内化,在AIME 2024基准测试中,即使严格禁用工具访问,其性能仍超越基线方法4.8%。