Collaborative perception enables agents to share complementary perceptual information with nearby agents. This would improve the perception performance and alleviate the issues of single-view perception, such as occlusion and sparsity. Most existing approaches mainly focus on single modality (especially LiDAR), and not fully exploit the superiority of multi-modal perception. We propose a collaborative perception paradigm, BM2CP, which employs LiDAR and camera to achieve efficient multi-modal perception. It utilizes LiDAR-guided modal fusion, cooperative depth generation and modality-guided intermediate fusion to acquire deep interactions among modalities of different agents, Moreover, it is capable to cope with the special case where one of the sensors, same or different type, of any agent is missing. Extensive experiments validate that our approach outperforms the state-of-the-art methods with 50X lower communication volumes in both simulated and real-world autonomous driving scenarios. Our code is available at https://github.com/byzhaoAI/BM2CP.
翻译:协同感知使智能体能够与邻近智能体共享互补的感知信息,从而提升感知性能并缓解单视角感知中的遮挡与稀疏问题。现有方法主要聚焦于单模态(尤其是激光雷达),未能充分利用多模态感知的优势。我们提出一种协同感知范式BM2CP,该范式利用激光雷达与摄像头实现高效的多模态感知。通过激光雷达引导的模态融合、协作深度生成及模态引导的中间融合,该方法能够获得不同智能体模态间的深度交互。此外,它能够应对任一智能体缺失任一类型传感器(同类型或不同类型)的特殊情况。大量实验表明,我们的方法在模拟与真实自动驾驶场景中均以50倍的低通信量优于现有最先进方法。代码已开源:https://github.com/byzhaoAI/BM2CP。