This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception. It addresses the task from a novel perspective of cooperative reconstruction, based on two key insights: 1) cooperating agents together provide a more holistic observation of the environment, and 2) the holistic observation can serve as valuable supervision to explicitly guide the model learning how to reconstruct the ideal observation based on collaboration. CORE instantiates the idea with three major components: a compressor for each agent to create more compact feature representation for efficient broadcasting, a lightweight attentive collaboration component for cross-agent message aggregation, and a reconstruction module to reconstruct the observation based on aggregated feature representations. This learning-to-reconstruct idea is task-agnostic, and offers clear and reasonable supervision to inspire more effective collaboration, eventually promoting perception tasks. We validate CORE on OPV2V, a large-scale multi-agent percetion dataset, in two tasks, i.e., 3D object detection and semantic segmentation. Results demonstrate that the model achieves state-of-the-art performance on both tasks, and is more communication-efficient.
翻译:本文提出CORE——一种概念简洁、高效且通信高效的多智能体协同感知模型。该模型从协同重建这一全新视角解决任务,基于两个关键洞察:1)协同智能体共同提供对环境更全面的观测;2)这种全面观测可作为有价值的监督信号,显式引导模型学习如何基于协作重建理想观测。CORE通过三个核心组件实现该思想:每个智能体配备的压缩器用于生成更紧凑的特征表示以实现高效广播,轻量级注意力协作组件用于跨智能体信息聚合,以及基于聚合特征表示重建观测的重建模块。这种学习重建的方案与任务无关,能提供清晰合理的监督信号以激发更有效的协作,最终提升感知任务性能。我们在大规模多智能体感知数据集OPV2V上,针对三维目标检测与语义分割两项任务验证了CORE。结果表明,该模型在两项任务上均达到最优性能,且具有更高的通信效率。