Enabling large language models to effectively utilize real-world tools is crucial for achieving embodied intelligence. Existing approaches to tool learning have primarily relied on either extremely large language models, such as GPT-4, to attain generalized tool-use abilities in a zero-shot manner, or have utilized supervised learning to train limited types of tools on compact models. However, it remains uncertain whether smaller language models can achieve generalized tool-use abilities without specific tool-specific training. To address this question, this paper introduces ToolAlpaca, a novel framework designed to automatically generate a tool-use corpus and learn generalized tool-use abilities on compact language models with minimal human intervention. Specifically, ToolAlpaca first collects a comprehensive dataset by building a multi-agent simulation environment, which contains 3938 tool-use instances from more than 400 real-world tool APIs spanning 50 distinct categories. Subsequently, the constructed corpus is employed to fine-tune compact language models, resulting in two models, namely ToolAlpaca-7B and ToolAlpaca-13B, respectively. Finally, we evaluate the ability of these models to utilize previously unseen tools without specific training. Experimental results demonstrate that ToolAlpaca achieves effective generalized tool-use capabilities comparable to those of extremely large language models like GPT-3.5. This validation supports the notion that learning generalized tool-use abilities is feasible for compact language models.
翻译:实现大型语言模型对现实世界工具的有效利用是实现具身智能的关键。现有工具学习方法主要依赖两种途径:一是通过GPT-4等超大规模语言模型以零样本方式获得通用工具使用能力,二是采用监督学习在轻量模型上训练有限类型工具。然而,较小规模语言模型能否在无需特定工具训练的前提下获得通用工具使用能力仍是未解之谜。针对此问题,本文提出ToolAlpaca这一新型框架,旨在通过最小化人工干预,自动生成工具使用语料库并让轻量语言模型习得通用工具使用能力。具体而言,ToolAlpaca首先通过构建多智能体仿真环境采集涵盖50个不同类别、400余个真实工具API的3938个工具使用实例,形成综合数据集。随后利用该语料库对轻量语言模型进行微调,分别得到ToolAlpaca-7B和ToolAlpaca-13B两个模型。最后我们评估了这些模型在未经特定训练的情况下使用全新工具的能力。实验结果表明,ToolAlpaca展现出与GPT-3.5等超大规模语言模型相媲美的有效通用工具使用能力。这一验证支持了轻量语言模型同样可以习得通用工具使用能力的观点。