Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. A promising but largely under-explored area is their potential to facilitate human coordination with many agents. Such capabilities would be useful in domains including disaster response, urban planning, and real-time strategy scenarios. In this work, we introduce (1) a real-time strategy game benchmark designed to evaluate these abilities and (2) a novel framework we term HIVE. HIVE empowers a single human to coordinate swarms of up to 2,000 agents using natural language dialog with an LLM. We present promising results on this multi-agent benchmark, with our hybrid approach solving tasks such as coordinating agent movements, exploiting unit weaknesses, leveraging human annotations, and understanding terrain and strategic points. However, our findings also highlight critical limitations of current models, including difficulties in processing spatial visual information and challenges in formulating long-term strategic plans. This work sheds light on the potential and limitations of LLMs in human-swarm coordination, paving the way for future research in this area. The HIVE project page, which includes videos of the system in action, can be found here: hive.syrkis.com.
翻译:大型语言模型(LLM)已在多种任务中展现出卓越性能。一个前景广阔但尚未被充分探索的领域是其促进人类与大量智能体协同的潜力。这种能力在灾害响应、城市规划及实时策略场景等领域具有重要应用价值。本研究提出(1)一个旨在评估此类能力的实时策略游戏基准,以及(2)我们称之为HIVE的新型框架。HIVE通过人类与LLM的自然语言对话,使单个人类能够协调多达2000个智能体的集群。我们在此多智能体基准测试中取得了积极成果:混合方法成功完成了协调智能体移动、利用单位弱点、整合人工标注、理解地形与战略要地等任务。然而,研究结果也揭示了当前模型的关键局限,包括处理空间视觉信息的困难以及制定长期战略规划的挑战。本研究阐明了LLM在人类-集群协同领域的潜力与局限,为该领域的未来研究铺平了道路。HIVE项目页面(包含系统运行视频)可见于:hive.syrkis.com。