Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks towards building general multi-agents collaboration infrastructure that encompass both LLM and human-NPCs collaborations. In this work, we propose a novel infrastructure - MindAgent - to evaluate planning and coordination emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework, to i) require understanding of the coordinator for a multi-agent system, ii) collaborate with human players via un-finetuned proper instructions, and iii) establish an in-context learning on few-shot prompt with feedback. Furthermore, we introduce CUISINEWORLD, a new gaming scenario and related benchmark that dispatch a multi-agent collaboration efficiency and supervise multiple agents playing the game simultaneously. We conduct comprehensive evaluations with new auto-metric CoS for calculating the collaboration efficiency. Finally, our infrastructure can be deployed into real-world gaming scenarios in a customized VR version of CUISINEWORLD and adapted in existing broader Minecraft gaming domain. We hope our findings on LLMs and the new infrastructure for general-purpose scheduling and coordination can help shed light on how such skills can be obtained by learning from large language corpora.
翻译:大型语言模型(LLMs)具备在多智能体系统中执行复杂调度的能力,并能协调这些智能体完成需要广泛协作的复杂任务。然而,尽管已有众多游戏框架问世,但社区在构建涵盖LLM与人类NPC协作的通用多智能体协作基础设施方面仍缺乏足够的基准。在本工作中,我们提出了一种新型基础设施——MindAgent——用于评估游戏交互中涌现的规划与协调能力。具体而言,我们的基础设施利用现有游戏框架,以实现:i)让协调者理解多智能体系统,ii)通过未经微调的恰当指令与人类玩家协作,以及iii)建立基于少量示例提示与反馈的上下文学习。此外,我们引入了CUISINEWORLD这一新的游戏场景及相关基准,用于评估多智能体协作效率并同时监督多个智能体的游戏过程。我们使用新的自动度量指标CoS计算协作效率,并进行了全面评估。最后,我们的基础设施可部署于CUISINEWORLD的定制化VR版本等真实游戏场景,并适用于现有更广泛的Minecraft游戏领域。我们希望我们在LLMs及通用调度与协调新基础设施方面的研究发现,能揭示如何通过学习大规模语言语料库来获取此类技能。