Large Language Models (LLMs) can enhance their reasoning capabilities by using external tools. However, many tasks lack predefined tools. Prior works have explored instructing LLMs to generate tools on their own, but such approaches depend heavily on internal knowledge and struggle when tasks fall outside the model's knowledge scope. To address this limitation, we propose RefTool, a reference-guided framework for automatic tool creation that leverages external materials, such as textbooks and knowledge snippets. RefTool consists of two modules: (1) tool creation, where LLMs generate executable tools from reference content, validate them using illustrative examples, and organize them hierarchically into a toolbox; and (2) tool utilization, where LLMs navigate the toolbox structure to select and apply the appropriate tools to solve problems. Experiments on causality, physics, and chemistry benchmarks demonstrate that RefTool outperforms existing tool-creation and domain-specific reasoning methods by 12.3% on average accuracy, while being cost-efficient and broadly generalizable to non-scientific tasks, e.g., extremely low-resource language translation. Analyses reveal that grounding tool creation in references produces accurate and faithful tools, and that the hierarchical structure facilitates effective tool selection. RefTool enables LLMs to overcome internal knowledge limitations, advancing generalizable reasoning in knowledge-intensive domains.
翻译:大型语言模型(LLM)可通过使用外部工具增强其推理能力。然而,许多任务缺乏预定义工具。已有研究探索指导LLM自主生成工具,但此类方法严重依赖模型内部知识,当任务超出模型知识范围时效果受限。为突破这一局限,我们提出RefTool——一种利用教科书、知识片段等外部材料的参考引导式自动工具创建框架。RefTool包含两个模块:(1)工具创建:LLM基于参考内容生成可执行工具,通过示例验证其有效性,并以层级结构组织成工具箱;(2)工具调用:LLM通过导航工具箱结构选择并应用合适工具解决问题。在因果推理、物理和化学基准测试上的实验表明,RefTool在平均准确率上优于现有工具创建方法与领域专用推理方法12.3%,同时具备高成本效益,并能泛化至非科学任务(如极低资源语言翻译)。分析表明:基于参考的工具创建能生成精确可靠的工具,层级结构则促进了有效的工具选择。RefTool使LLM能够突破内部知识限制,推动知识密集型领域的可泛化推理发展。