Large language models (LLMs) rely on tool use to act as autonomous agents, yet often fail in multi-step execution due to insufficient tool-related knowledge and ineffective knowledge activation. Therefore, we present a systematic study on how knowledge influences tool-use performance, covering the stages of knowledge acquisition, activation, and internalization. In the knowledge acquisition stage, we acquire and evaluate various forms of experiential knowledge, and our analysis shows that simple instance-level knowledge can already provide strong and reliable gains, while abstract intent-level knowledge offers limited benefits. At inference time, to activate knowledge, we find that prompting LLM to expand the depth of reasoning yields diminishing returns, whereas expanding the width of reasoning by parallel sampling with aggregation more effectively activates latent experiential knowledge. At training time, for knowledge internalization, post-training with knowledge-augmented data further improves performance, with reinforcement learning outperforming supervised fine-tuning. Based on these insights, we propose the Knowledge-Augmented Tool Execution (KATE), a knowledge-augmented tool execution framework that integrates experiential knowledge with reasoning-width-expanded inference and knowledge-aware training. Experiments on BFCL-V3 and AppWorld demonstrate consistent and substantial improvements over strong baselines across model scales. Our Code is available at https://github.com/hypasd-art/KATE.
翻译:大型语言模型(LLMs)依赖工具使用以充当自主智能体,但由于工具相关知识不足以及知识激活效率低下,在多步执行中常遭遇失败。为此,我们系统研究了知识如何影响工具使用性能,涵盖知识获取、激活与内化三个阶段。在知识获取阶段,我们获取并评估多种形式的经验知识,分析表明简单的实例级知识已能带来稳定且可靠的性能提升,而抽象意图级知识贡献有限。在推理阶段的知识激活中,我们发现引导LLM扩展推理深度会产生递减收益,而通过并行采样与聚合扩展推理宽度能更有效地激活潜在经验知识。在训练阶段的知识内化中,使用知识增强数据进行后训练可进一步改善性能,其中强化学习优于监督微调。基于这些发现,我们提出知识增强工具执行框架KATE(Knowledge-Augmented Tool Execution),该框架将经验知识与推理宽度扩展推理及知识感知训练相结合。在BFCL-V3与AppWorld上的实验表明,该方法在不同模型规模下均能持续显著超越强基线。我们的代码已开源:https://github.com/hypasd-art/KATE。