The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limited explorations are available. Therefore, we provide a comprehensive comparison of LAA in terms of both agent architectures and LLM backbones. Additionally, we propose a new strategy to orchestrate multiple LAAs such that each labor LAA focuses on one type of action, \textit{i.e.} BOLAA, where a controller manages the communication among multiple agents. We conduct simulations on both decision-making and multi-step reasoning environments, which comprehensively justify the capacity of LAAs. Our performance results provide quantitative suggestions for designing LAA architectures and the optimal choice of LLMs, as well as the compatibility of both. We release our implementation code of LAAs to the public at \url{https://github.com/salesforce/BOLAA}.
翻译:大语言模型(LLM)的巨大成功推动了新兴领域——大语言模型增强的自主智能体(LAA)的探索。LAA能够通过其核心LLM生成动作并与环境交互,从而通过依赖过往交互(如观测和动作)来增强解决复杂任务的能力。由于LAA的研究尚处于起步阶段,现有探索十分有限。为此,我们从智能体架构和LLM主干两个方面对LAA进行了全面比较。此外,我们提出了一种新的策略来编排多个LAA,即BOLAA,使每个劳动型LAA专注于单一类别的动作,同时由一个控制器管理多智能体之间的通信。我们在决策制定和多步推理环境上进行了仿真实验,全面验证了LAA的能力。性能结果为设计LAA架构、选择最优LLM以及评估两者兼容性提供了量化建议。我们将LAA的实现代码公开发布在\url{https://github.com/salesforce/BOLAA}。