Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released.
翻译:大语言模型(LLMs)在工具使用方面取得了显著进展,但其能力受限于API可用性及隐式推理的不稳定性,尤其在同时涉及规划与执行的任务中更为突出。为突破这些限制,我们提出CREATOR——一种创新框架,使LLMs能够通过文档解析与代码实现自主创建工具。该框架将抽象工具创建与具体决策执行相解耦,从而提升任务性能。我们在MATH和TabMWP基准测试上评估CREATOR,这两个数据集分别包含具有挑战性的数学竞赛题目与多样化的表格内容。实验表明,CREATOR显著优于现有的思维链、程序思维及工具调用基线方法。此外,我们构建了包含2,000道多样化问题的Creation Challenge数据集,以突显LLMs工具创建能力的必要性与优势。进一步研究表明:利用LLMs作为工具创建者有助于促进知识迁移;不同LLMs展现出层次化的工具创建能力,使其能适应多样化场景。工具创建能力革新了LLMs的问题解决范式,推动我们迈向人工智能的新前沿。所有代码与数据均已开源。