Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we present Coordinated Multi-Robot Exploration and Search using Vision Language Models (COMRES-VLM), a novel framework that leverages Vision Language Models (VLMs) for intelligent coordination of MRS tasked with efficient exploration and target object search. COMRES-VLM integrates real-time frontier cluster extraction and topological skeleton analysis with VLM reasoning over shared occupancy maps, robot states, and optional natural language priors, in order to generate globally consistent waypoint assignments. Extensive experiments in large-scale simulated indoor environments with up to six robots demonstrate that COMRES-VLM consistently outperforms state-of-the-art coordination methods, including Capacitated Vehicle Routing Problem (CVRP) and Voronoi-based planners, achieving 10.2\% faster exploration completion and 55.7\% higher object search efficiency. Notably, COMRES-VLM enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
翻译:在未知室内环境中实现自主探索与目标搜索,对于多机器人系统而言仍具挑战性。传统方法通常依赖贪婪的前沿分配策略,机器人间的协调能力有限。本文提出一种基于视觉语言模型的多机器人协同探索与搜索框架(COMRES-VLM),该创新框架利用视觉语言模型对执行高效探索与目标搜索任务的多机器人系统进行智能协调。COMRES-VLM将实时前沿簇提取和拓扑骨架分析,与基于共享占据地图、机器人状态及可选自然语言先验信息的VLM推理相结合,从而生成全局一致的路径点分配方案。在多达六个机器人的大规模室内仿真环境中进行的广泛实验表明,COMRES-VLM在探索完成速度上提升了10.2%,目标搜索效率提高了55.7%,其性能持续优于包括带容量约束的车辆路径规划方法和基于Voronoi图的规划器在内的先进协调方法。值得注意的是,COMRES-VLM实现了基于自然语言的目标搜索功能,使操作人员能够提供传统算法无法解析的高层语义指导。