Equipping Large Language Models (LLMs) with external tools enables them to solve complex real-world problems. However, the robustness of existing methods remains a critical challenge when confronting novel or evolving tools. Existing trajectory-centric paradigms primarily rely on memorizing static solution paths during training, which limits the ability of LLMs to generalize tool usage to newly introduced or previously unseen tools. In this paper, we propose ToolMaster, a framework that shifts tool use from imitating golden tool-calling trajectories to actively learning tool usage through interaction with the environment. To optimize LLMs for tool planning and invocation, ToolMaster adopts a trial-and-execution paradigm, which trains LLMs to first imitate teacher-generated trajectories containing explicit tool trials and self-correction, followed by reinforcement learning to coordinate the trial and execution phases jointly. This process enables agents to autonomously explore correct tool usage by actively interacting with environments and forming experiential knowledge that benefits tool execution. Experimental results demonstrate that ToolMaster significantly outperforms existing baselines in terms of generalization and robustness across unseen or unfamiliar tools. All code and data are available at https://github.com/NEUIR/ToolMaster.
翻译:为大型语言模型配备外部工具使其能够解决复杂的现实问题。然而,当面对新颖或不断演化的工具时,现有方法的鲁棒性仍然是一个关键挑战。现有以轨迹为中心的方法主要依赖于训练期间记忆静态解决方案路径,这限制了大型语言模型将工具使用推广到新引入或先前未见工具的能力。本文提出ToolMaster框架,该框架将工具使用从模仿黄金工具调用轨迹转变为通过与环境交互主动学习工具使用。为优化大型语言模型的工具规划与调用能力,ToolMaster采用试用-执行范式:首先训练大型语言模型模仿包含显式工具试用与自我修正的教师生成轨迹,随后通过强化学习协同优化试用与执行阶段。这一过程使智能体能够通过主动与环境交互来自主探索正确的工具使用方法,并形成有益于工具执行的经验性知识。实验结果表明,ToolMaster在未见或不熟悉工具上的泛化能力与鲁棒性显著优于现有基线方法。所有代码与数据均公开于 https://github.com/NEUIR/ToolMaster。