Large Language Models (LLMs) have demonstrated significant progress in utilizing external APIs as tools for various tasks. However, their tool-using ability is limited by the availability of suitable APIs and the instability of implicit reasoning, particularly when simultaneously engaging in reasoning about plans and actual calculations. To address these limitations, we propose CREATOR, a novel framework that empowers LLMs to create their own tools through documentation and code realization. CREATOR disentangles the LLM's ability into two distinct phases: abstract tool creation and concrete decision execution, which results in improved LLM performance. We evaluate CREATOR on two established benchmarks: MATH, which consists of challenging math competition problems, and TabMWP, which includes diverse tabular contents for problem-solving. Remarkably, CREATOR significantly outperforms existing chain-of-thought (CoT), program-of-thought (PoT), and tool-using baselines on these two benchmarks. Additionally, we present a new dataset, Creation Challenge, comprising 2K diverse questions, to highlight the necessity and benefits of LLMs' tool creation ability in effectively addressing these problems. Furthermore, our research reveals that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to flexibly tackle diverse situations. Our study represents a promising avenue for maximizing the potential of LLMs and advancing toward truly intelligent and adaptable AI systems.
翻译:大型语言模型(LLMs)在利用外部API作为工具执行各类任务方面取得了显著进展。然而,其工具使用能力受限于合适的API可用性及隐性推理的不稳定性,尤其是在同时进行计划推理和实际计算时。为解决这些限制,我们提出CREATOR——一种通过文档与代码实现赋予LLMs自主创建工具能力的新颖框架。该框架将LLM的能力解构为两个独立阶段:抽象工具创建与具体决策执行,从而提升LLM性能。我们在两个标准基准上评估CREATOR:包含具有挑战性数学竞赛问题的MATH,以及涵盖多样化表格内容的解题基准TabMWP。值得注意的是,CREATOR在这两个基准上显著优于现有的思维链(CoT)、程序思维(PoT)及工具使用基线方法。此外,我们提出包含2000个多样化问题的新数据集Creation Challenge,以凸显LLMs工具创建能力在有效解决此类问题中的必要性与优势。进一步研究发现,利用LLMs作为工具创建者能促进知识迁移,且LLMs展现出不同层次的工具创建能力,使其能灵活应对多样化情境。本研究为最大限度挖掘LLMs潜力、推动真正智能且自适应的AI系统发展开辟了有前景的路径。