A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via prompting techniques, such as in-context learning or re-prompting with state feedback, placing new importance on the token budget for the context window. An under-explored but natural next direction is to investigate LLMs as multi-robot task planners. However, long-horizon, heterogeneous multi-robot planning introduces new challenges of coordination while also pushing up against the limits of context window length. It is therefore critical to find token-efficient LLM planning frameworks that are also able to reason about the complexities of multi-robot coordination. In this work, we compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents. We find that a hybrid framework achieves better task success rates across all four tasks and scales better to more agents. We further demonstrate the hybrid frameworks in 3D simulations where the vision-to-text problem and dynamical errors are considered. See our project website https://yongchao98.github.io/MIT-REALM-Multi-Robot/ for prompts, videos, and code.
翻译:近期大量研究表明,预训练大语言模型可有效规划多种单机器人任务。通过上下文学习或状态反馈重提示等提示技术,大语言模型的规划性能显著提升,这使得上下文窗口的令牌预算变得尤为重要。一个尚未充分探索但极具自然延伸性的方向是研究大语言模型作为多机器人任务规划器。然而,长时域、异构多机器人规划在引入协调新挑战的同时,也逼近了上下文窗口长度的极限。因此,亟需构建既能应对多机器人协调复杂性,又具备令牌高效性的大语言模型规划框架。本研究比较了四种多智能体通信框架(集中式、分布式及两种混合式)在四种协调依赖型二维多智能体任务场景中的任务成功率和令牌效率,并逐步增加智能体数量。研究发现,混合框架在所有四种任务中均取得更高任务成功率,且对更多智能体具有更优的可扩展性。我们进一步在考虑视觉-文本转换问题及动力学误差的三维模拟场景中验证了混合框架。相关提示、视频及代码详见项目网站https://yongchao98.github.io/MIT-REALM-Multi-Robot/。