Evaluating Multi-Agent Coordination Abilities in Large Language Models

A pivotal aim in contemporary AI research is to develop agents proficient in multi-agent coordination, enabling effective collaboration with both humans and other systems. Large Language Models (LLMs), with their notable ability to understand, generate, and interpret language in a human-like manner, stand out as promising candidates for the development of such agents. In this study, we build and assess the effectiveness of agents crafted using LLMs in various coordination scenarios. We introduce the LLM-Coordination (LLM-Co) Framework, specifically designed to enable LLMs to play coordination games. With the LLM-Co framework, we conduct our evaluation with three game environments and organize the evaluation into five aspects: Theory of Mind, Situated Reasoning, Sustained Coordination, Robustness to Partners, and Explicit Assistance. First, the evaluation of the Theory of Mind and Situated Reasoning reveals the capabilities of LLM to infer the partner's intention and reason actions accordingly. Then, the evaluation around Sustained Coordination and Robustness to Partners further showcases the ability of LLMs to coordinate with an unknown partner in complex long-horizon tasks, outperforming Reinforcement Learning baselines. Lastly, to test Explicit Assistance, which refers to the ability of an agent to offer help proactively, we introduce two novel layouts into the Overcooked-AI benchmark, examining if agents can prioritize helping their partners, sacrificing time that could have been spent on their tasks. This research underscores the promising capabilities of LLMs in sophisticated coordination environments and reveals the potential of LLMs in building strong real-world agents for multi-agent coordination.

翻译：当代人工智能研究的一个关键目标是开发精通多智能体协调的智能体，使其能够与人类及其他系统有效协作。大型语言模型（LLMs）因其在理解、生成和解读语言方面展现出接近人类的显著能力，成为构建此类智能体的理想候选。本研究构建并评估了利用LLMs打造的智能体在多种协调场景中的有效性。我们提出了专门用于使LLMs能够进行协调游戏的LLM-Coordination（LLM-Co）框架。借助该框架，我们在三种游戏环境中开展评估，并将评估分为五个维度：心智理论、情境推理、持续协调、合作伙伴鲁棒性以及显式协助。首先，对心智理论和情境推理的评估揭示了LLMs推断合作伙伴意图并据此推理行动的能力。随后，围绕持续协调和合作伙伴鲁棒性的评估进一步展示了LLMs与未知合作伙伴在复杂长周期任务中的协调能力，其表现优于强化学习基线。最后，为测试显式协助（即智能体主动提供帮助的能力），我们在Overcooked-AI基准测试中引入两种新布局，检验智能体是否能够优先帮助合作伙伴，即使牺牲自身任务执行时间。本研究凸显了LLMs在复杂协调环境中的卓越能力，并揭示了其在构建面向多智能体协调的强大现实世界智能体方面的潜力。