Tool learning empowers large language models (LLMs) as agents to use external tools to extend their capability. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating the result into the next action prediction. However, they still suffer from potential performance degradation when addressing complex tasks due to: (1) the limitation of the inherent capability of a single LLM to perform diverse actions, and (2) the struggle to adaptively correct mistakes when the task fails. To mitigate these problems, we propose the ConAgents, a Cooperative and interactive Agents framework, which modularizes the workflow of tool learning into Grounding, Execution, and Observing agents. We also introduce an iterative calibration (IterCali) method, enabling the agents to adapt themselves based on the feedback from the tool environment. Experiments conducted on three datasets demonstrate the superiority of our ConAgents (e.g., 6 point improvement over the SOTA baseline). We further provide fine-granularity analysis for the efficiency and consistency of our framework.
翻译:工具学习使大型语言模型(LLM)作为智能体能够利用外部工具扩展其能力。现有方法采用单一基于LLM的智能体迭代选择并执行工具,随后将结果纳入下一动作预测。然而,在处理复杂任务时,这些方法仍存在性能下降的问题,原因在于:(1)单一LLM执行多样化动作的内在能力受限;(2)任务失败时难以自适应修正错误。为缓解这些问题,我们提出ConAgents——一个协作与交互式智能体框架,将工具学习的工作流模块化为基础定位、执行与观察三类智能体。我们还引入了迭代校准方法,使智能体能够根据工具环境的反馈进行自适应调整。在三个数据集上的实验证明了ConAgents的优越性(例如,相比当前最优基线提升6个百分点)。我们进一步对框架的效率和一致性进行了细粒度分析。