Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains. However, their capacity for planning and communication in multi-agent cooperation remains unclear, even though these are crucial skills for intelligent embodied agents. In this paper, we present a novel framework that utilizes LLMs for multi-agent cooperation and tests it in various embodied environments. Our framework enables embodied agents to plan, communicate, and cooperate with other embodied agents or humans to accomplish long-horizon tasks efficiently. We demonstrate that recent LLMs, such as GPT-4, can surpass strong planning-based methods and exhibit emergent effective communication using our framework without requiring fine-tuning or few-shot prompting. We also discover that LLM-based agents that communicate in natural language can earn more trust and cooperate more effectively with humans. Our research underscores the potential of LLMs for embodied AI and lays the foundation for future research in multi-agent cooperation. Videos can be found on the project website https://vis-www.cs.umass.edu/Co-LLM-Agents/.
翻译:大语言模型(LLMs)在跨领域单智能体具身任务中已展现出卓越的规划能力。然而,尽管规划与沟通是多智能体智能体协作的关键技能,LLMs在这类协作场景中的能力尚不明确。本文提出一种利用LLMs实现多智能体协作的创新框架,并在多种具身环境中进行测试。该框架使具身智能体能够规划、沟通并与其它智能体或人类高效协作,完成长时域任务。实验表明,GPT-4等最新LLMs可超越基于强规划的基准方法,且仅需通过本框架即可涌现出高效沟通能力,无需微调或少量样本提示。我们还发现,使用自然语言交流的LLM智能体更易获得人类信任,并能与人类实现更高效协作。本研究凸显了LLMs在具身人工智能领域的潜力,为多智能体协作的未来研究奠定基础。项目视频详见官网 https://vis-www.cs.umass.edu/Co-LLM-Agents/。