In advanced human-robot interaction tasks, visual target navigation is crucial for autonomous robots navigating unknown environments. While numerous approaches have been developed in the past, most are designed for single-robot operations, which often suffer from reduced efficiency and robustness due to environmental complexities. Furthermore, learning policies for multi-robot collaboration are resource-intensive. To address these challenges, we propose Co-NavGPT, an innovative framework that integrates Large Language Models (LLMs) as a global planner for multi-robot cooperative visual target navigation. Co-NavGPT encodes the explored environment data into prompts, enhancing LLMs' scene comprehension. It then assigns exploration frontiers to each robot for efficient target search. Experimental results on Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT surpasses existing models in success rates and efficiency without any learning process, demonstrating the vast potential of LLMs in multi-robot collaboration domains. The supplementary video, prompts, and code can be accessed via the following link: \href{https://sites.google.com/view/co-navgpt}{https://sites.google.com/view/co-navgpt}.
翻译:在人机交互的高级任务中,视觉目标导航对于在未知环境中自主移动的机器人至关重要。尽管过去已开发出多种方法,但大多数针对单机器人操作设计,常因环境复杂性而降低效率和鲁棒性。此外,多机器人协作策略的学习过程资源消耗巨大。为应对这些挑战,我们提出Co-NavGPT,一种将大语言模型(LLMs)作为全局规划器的创新框架,用于多机器人协作视觉目标导航。Co-NavGPT将已探索环境数据编码为提示词,增强LLMs的场景理解能力;随后为每个机器人分配探索边界用于高效目标搜索。在Habitat-Matterport 3D(HM3D)数据集上的实验结果表明,Co-NavGPT无需任何学习过程即可在成功率和效率上超越现有模型,展示了LLMs在多机器人协作领域的巨大潜力。补充视频、提示词及代码可通过以下链接获取:\href{https://sites.google.com/view/co-navgpt}{https://sites.google.com/view/co-navgpt}。