Tool use enables large language models (LLMs) to access external information, invoke software systems, and act in digital environments beyond what can be solved from model parameters alone. Early research mainly studied whether a model could select and execute a correct single tool call. As agent systems evolve, however, the central problem has shifted from isolated invocation to multi-tool orchestration over long trajectories with intermediate state, execution feedback, changing environments, and practical constraints such as safety, cost, and verifiability. We comprehensively review recent progress in multi-tool LLM agents and analyzes the state of the art in this rapidly developing area. First, we unify task formulations and distinguish single-call tool use from long-horizon orchestration. Then, we organize the literature around six core dimensions: inference-time planning and execution, training and trajectory construction, safety and control, efficiency under resource constraints, capability completeness in open environments, and benchmark design and evaluation. We further summarize representative applications in software engineering, enterprise workflows, graphical user interfaces, and mobile systems. Finally, we discuss major challenges and outline future directions for building reliable, scalable, and verifiable multi-tool agents.
翻译:工具使用使大型语言模型(LLM)能够访问外部信息、调用软件系统并在数字环境中执行操作,从而超越仅凭模型参数所能解决的问题范畴。早期研究主要关注模型能否正确选择并执行单一工具调用。然而,随着智能体系统的发展,核心问题已从孤立调用转向在长轨迹中进行多工具编排,其间需考虑中间状态、执行反馈、动态变化的环境以及安全性、成本和可验证性等实际约束。我们全面综述了多工具 LLM 智能体的近期进展,并分析了这一快速发展领域的技术现状。首先,我们统一了任务表述,并区分了单次调用工具使用与长周期编排。随后,我们从六个核心维度组织文献:推理时规划与执行、训练与轨迹构建、安全与控制、资源约束下的效率、开放环境中的能力完备性、以及基准设计评估。此外,我们总结了在软件工程、企业工作流、图形用户界面和移动系统中的代表性应用。最后,我们讨论了构建可靠、可扩展且可验证的多工具智能体所面临的主要挑战,并展望了未来方向。