This paper investigates the multi-agent cooperative exploration problem, which requires multiple agents to explore an unseen environment via sensory signals in a limited time. A popular approach to exploration tasks is to combine active mapping with planning. Metric maps capture the details of the spatial representation, but are with high communication traffic and may vary significantly between scenarios, resulting in inferior generalization. Topological maps are a promising alternative as they consist only of nodes and edges with abstract but essential information and are less influenced by the scene structures. However, most existing topology-based exploration tasks utilize classical methods for planning, which are time-consuming and sub-optimal due to their handcrafted design. Deep reinforcement learning (DRL) has shown great potential for learning (near) optimal policies through fast end-to-end inference. In this paper, we propose Multi-Agent Neural Topological Mapping (MANTM) to improve exploration efficiency and generalization for multi-agent exploration tasks. MANTM mainly comprises a Topological Mapper and a novel RL-based Hierarchical Topological Planner (HTP). The Topological Mapper employs a visual encoder and distance-based heuristics to construct a graph containing main nodes and their corresponding ghost nodes. The HTP leverages graph neural networks to capture correlations between agents and graph nodes in a coarse-to-fine manner for effective global goal selection. Extensive experiments conducted in a physically-realistic simulator, Habitat, demonstrate that MANTM reduces the steps by at least 26.40% over planning-based baselines and by at least 7.63% over RL-based competitors in unseen scenarios.
翻译:本文研究了多智能体协同探索问题,要求多个智能体在有限时间内通过感知信号探索未知环境。一种常见的探索策略是将主动建图与规划相结合。度量地图能够捕捉空间表征的细节,但通信开销高且在不同场景下可能差异显著,导致泛化性能较差。拓扑地图作为替代方案具有潜力,因其仅包含抽象的关键信息节点和边,且受场景结构影响较小。然而,现有基于拓扑的探索任务大多采用经典规划方法,受限于人工设计规则而存在耗时且次优的问题。深度强化学习已展现出通过快速端到端推理学习(近)最优策略的显著优势。本文提出多智能体神经拓扑映射方法(MANTM),旨在提升多智能体探索任务的效率和泛化能力。MANTM主要由拓扑映射器和新型分层拓扑规划器(HTP)构成。拓扑映射器利用视觉编码器和距离启发式方法构建包含主节点及其对应鬼节点的图结构。HTP通过图神经网络以由粗到细的方式捕捉智能体与图节点间的关联性,实现高效的全局目标选择。在物理真实模拟器Habitat中的大量实验表明,在未知场景中,MANTM相较于基于规划的基线方法至少减少26.40%的探索步数,相较于基于强化学习的竞争方法至少减少7.63%的探索步数。