Collaborative perception aims to extend sensing coverage and improve perception accuracy by sharing information among multiple agents. However, due to differences in viewpoints and spatial positions, agents often acquire heterogeneous observations. Existing intermediate fusion methods primarily focus on aligning similar features, often overlooking the perceptual diversity among agents. To address this limitation, we propose CoBEVMoE, a novel collaborative perception framework that operates in the Bird's Eye View (BEV) space and incorporates a Dynamic Mixture-of-Experts (DMoE) architecture. In DMoE, each expert is dynamically generated based on the input features of a specific agent, enabling it to extract distinctive and reliable cues while attending to shared semantics. This design allows the fusion process to explicitly model both feature similarity and heterogeneity across agents. Furthermore, we introduce a Dynamic Expert Metric Loss (DEML) to enhance inter-expert diversity and improve the discriminability of the fused representation. Extensive experiments on the OPV2V and DAIR-V2X-C datasets demonstrate that CoBEVMoE achieves state-of-the-art performance. Specifically, it improves the IoU for Camera-based BEV segmentation by +1.5% on OPV2V and the AP@0.5 for LiDAR-based 3D object detection by +3.0% on DAIR-V2X-C, verifying the effectiveness of expert-based heterogeneous feature modeling in multi-agent collaborative perception. The source code will be made publicly available at https://github.com/godk0509/CoBEVMoE.
翻译:协同感知旨在通过多智能体间的信息共享扩展感知范围并提升感知精度。然而,由于视角与空间位置的差异,智能体获取的观测信息往往具有异构性。现有中间融合方法主要关注对齐相似特征,常忽视智能体间的感知差异性。为克服这一局限,本文提出CoBEVMoE——一种在鸟瞰图空间运行并集成动态专家混合架构的新型协同感知框架。在动态专家混合模块中,每个专家根据特定智能体的输入特征动态生成,使其能够提取独特且可靠的线索,同时关注共享语义。该设计使融合过程能够显式建模智能体间的特征相似性与异构性。此外,我们提出动态专家度量损失以增强专家间多样性,提升融合表征的判别能力。在OPV2V和DAIR-V2X-C数据集上的大量实验表明,CoBEVMoE实现了最先进的性能:在OPV2V上将基于相机的鸟瞰图分割IoU提升+1.5%,在DAIR-V2X-C上将基于激光雷达的3D目标检测AP@0.5提升+3.0%,验证了基于专家的异构特征建模在多智能体协同感知中的有效性。源代码将发布于https://github.com/godk0509/CoBEVMoE。