Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\underline{\textbf{D}}ecomposing tasks and \underline{\textbf{Co}}mposing \underline{\textbf{Re}}asoning processes) that first incentivize the LRMs' task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs' reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5$\times$ smaller. The source code is available at https://github.com/alibaba/EfficientAI.
翻译:有效的工具使用和推理能力是大型推理模型(LRMs)解决复杂现实问题的关键。通过实证分析,我们发现当前LRMs在复杂工具使用场景中缺乏子任务分解能力,导致“惰性推理”问题。为解决此问题,我们提出一个两阶段训练框架D-CORE(任务分解与推理过程组合),首先通过自蒸馏机制激励LRMs的任务分解推理能力,随后采用多样性感知强化学习恢复LRMs的反思推理能力。D-CORE在不同基准测试和模型规模上均实现了稳健的工具使用性能提升。在BFCLv3上的实验表明本方法的优越性:D-CORE-8B达到77.7%准确率,超越最优8B模型5.7%;同时D-CORE-14B以79.3%准确率创下新纪录,尽管体积缩小5倍,性能仍超越70B模型。源代码发布于https://github.com/alibaba/EfficientAI。