A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via prompting techniques, such as in-context learning or re-prompting with state feedback, placing new importance on the token budget for the context window. An under-explored but natural next direction is to investigate LLMs as multi-robot task planners. However, long-horizon, heterogeneous multi-robot planning introduces new challenges of coordination while also pushing up against the limits of context window length. It is therefore critical to find token-efficient LLM planning frameworks that are also able to reason about the complexities of multi-robot coordination. In this work, we compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents. We find that a hybrid framework achieves better task success rates across all four tasks and scales better to more agents. We further demonstrate the hybrid frameworks in 3D simulations where the vision-to-text problem and dynamical errors are considered. See our project website https://yongchao98.github.io/MIT-REALM-Multi-Robot/ for prompts, videos, and code.
翻译:近期一系列研究表明,预训练的大语言模型(LLMs)可有效作为多种单机器人任务的任务规划器。通过提示工程技术(如上下文学习或结合状态反馈的重新提示),LLMs的规划性能显著提升,这使得上下文窗口的令牌预算变得尤为重要。一个尚未充分探索但自然的后续研究方向是探索LLMs作为多机器人任务规划器的潜力。然而,长期、异构的多机器人规划在引入新型协调挑战的同时,也面临上下文窗口长度的瓶颈限制。因此,寻找既能高效利用令牌又能推理多机器人协调复杂性的LLM规划框架至关重要。本研究比较了四种多智能体通信框架(集中式、分布式及两种混合式)在四种依赖协调的多智能体二维任务场景中的任务成功率和令牌效率,并逐步增加智能体数量。实验结果表明,混合框架在所有任务中均取得了更高的任务成功率,且对更多智能体具有更好的可扩展性。我们进一步在考虑视觉-文本转换问题和动态误差的三维仿真中验证了混合框架的性能。相关提示词、视频及代码详见项目网站https://yongchao98.github.io/MIT-REALM-Multi-Robot/。