Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors in this field. To this end, we present a systematic investigation of tool learning in this paper. We first introduce the background of tool learning, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models. Then we recapitulate existing tool learning research into tool-augmented and tool-oriented learning. We formulate a general tool learning framework: starting from understanding the user instruction, models should learn to decompose a complex task into several subtasks, dynamically adjust their plan through reasoning, and effectively conquer each sub-task by selecting appropriate tools. We also discuss how to train models for improved tool-use capabilities and facilitate the generalization in tool learning. Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools. Finally, we discuss several open problems that require further investigation for tool learning. Overall, we hope this paper could inspire future research in integrating tools with foundation models.
翻译:人类拥有非凡的创造和使用工具的能力,这使其能够突破生理限制并探索新的领域。随着基础模型的出现,人工智能系统在工具使用方面有望达到与人类相当的水平。这一范式,即基于基础模型的工具学习,融合了专业化工具与基础模型的优势,在问题求解中实现更高的准确性、效率和自动化。尽管该领域潜力巨大,但对其核心挑战、发展机遇及未来方向仍缺乏系统性认知。为此,本文对工具学习进行了系统性研究。我们首先介绍了工具学习的背景,包括其认知起源、基础模型的范式转变以及工具与模型的互补作用。随后将现有工具学习研究划分为工具增强学习和面向工具学习两类。我们提出了一个通用工具学习框架:从理解用户指令开始,模型需学会将复杂任务分解为若干子任务,通过推理动态调整计划,并选择适当工具有效攻克每个子任务。我们还探讨了如何训练模型以提升工具使用能力,并促进工具学习的泛化。鉴于此前研究缺乏系统性的工具学习评估,我们针对18种代表性工具进行实验,展示了当前基础模型在熟练运用工具方面的潜力。最后,我们讨论了工具学习中若干需要进一步研究的开放性问题。总体而言,希望本文能启发未来将工具与基础模型相结合的研究。