In advanced human-robot interaction tasks, visual target navigation is crucial for autonomous robots navigating unknown environments. While numerous approaches have been developed in the past, most are designed for single-robot operations, which often suffer from reduced efficiency and robustness due to environmental complexities. Furthermore, learning policies for multi-robot collaboration are resource-intensive. To address these challenges, we propose Co-NavGPT, an innovative framework that integrates Large Language Models (LLMs) as a global planner for multi-robot cooperative visual target navigation. Co-NavGPT encodes the explored environment data into prompts, enhancing LLMs' scene comprehension. It then assigns exploration frontiers to each robot for efficient target search. Experimental results on Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT surpasses existing models in success rates and efficiency without any learning process, demonstrating the vast potential of LLMs in multi-robot collaboration domains. The supplementary video, prompts, and code can be accessed via the following link: https://sites.google.com/view/co-navgpt
翻译:在高级人机交互任务中,视觉目标导航对于自主机器人在未知环境中的导航至关重要。尽管过去已开发出多种方法,但多数方法针对单机器人操作设计,常因环境复杂性而导致效率和鲁棒性降低。此外,多机器人协作的学习策略需要大量资源。为应对这些挑战,我们提出Co-NavGPT这一创新框架,该框架将大语言模型(LLMs)整合为全局规划器,用于多机器人协作视觉目标导航。Co-NavGPT将已探索环境数据编码为提示,增强LLMs的场景理解能力,随后为每台机器人分配探索前沿以实现高效目标搜索。在Habitat-Matterport 3D(HM3D)上的实验表明,Co-NavGPT无需任何学习过程即在成功率和效率上超越现有模型,展现了LLMs在多机器人协作领域的巨大潜力。补充视频、提示及代码可通过以下链接获取:https://sites.google.com/view/co-navgpt