This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception. It addresses the task from a novel perspective of cooperative reconstruction, based on two key insights: 1) cooperating agents together provide a more holistic observation of the environment, and 2) the holistic observation can serve as valuable supervision to explicitly guide the model learning how to reconstruct the ideal observation based on collaboration. CORE instantiates the idea with three major components: a compressor for each agent to create more compact feature representation for efficient broadcasting, a lightweight attentive collaboration component for cross-agent message aggregation, and a reconstruction module to reconstruct the observation based on aggregated feature representations. This learning-to-reconstruct idea is task-agnostic, and offers clear and reasonable supervision to inspire more effective collaboration, eventually promoting perception tasks. We validate CORE on OPV2V, a large-scale multi-agent percetion dataset, in two tasks, i.e., 3D object detection and semantic segmentation. Results demonstrate that the model achieves state-of-the-art performance on both tasks, and is more communication-efficient.
翻译:本文提出CORE,一种概念简洁、高效且通信经济的多智能体协同感知模型。该模型从协同重建的新视角解决该任务,基于两个关键洞察:1)协同智能体共同提供对环境更全面的观测;2)这一全局观测可作为有价值的监督信号,明确引导模型学习如何基于协作重建理想观测。CORE通过三大核心组件实现该思想:用于各智能体生成紧凑特征表示以实现高效广播的压缩器、用于跨智能体消息聚合的轻量级注意力协作组件,以及基于聚合特征重建观测的重建模块。这种"学习重建"思想具有任务无关性,通过清晰合理的监督机制激发更有效的协作,最终提升感知任务性能。我们在大规模多智能体感知数据集OPV2V上,针对三维目标检测与语义分割两项任务验证CORE。结果表明,该模型在两项任务中均达到当前最优性能,且具有更高的通信效率。