Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources. This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view (i.e., post-collaboration feature) and its underlying relationship to individual views (i.e., pre-collaboration features), which were treated as an opaque procedure by most existing works. We propose a novel framework named CMiMC (Contrastive Mutual Information Maximization for Collaborative Perception) for intermediate collaboration. The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks. In particular, we define multi-view mutual information (MVMI) for intermediate collaboration that evaluates correlations between collaborative views and individual views on both global and local scales. We establish CMiMNet based on multi-view contrastive learning to realize estimation and maximization of MVMI, which assists the training of a collaboration encoder for voxel-level feature fusion. We evaluate CMiMC on V2X-Sim 1.0, and it improves the SOTA average precision by 3.08% and 4.44% at 0.5 and 0.7 IoU (Intersection-over-Union) thresholds, respectively. In addition, CMiMC can reduce communication volume to 1/32 while achieving performance comparable to SOTA. Code and Appendix are released at https://github.com/77SWF/CMiMC.
翻译:多智能体感知 (MAP) 通过融合多个数据源的信息,使自主系统能够理解复杂环境。本文聚焦于中间协作型MAP,重点探究协作视图(即协作后特征)的“优良”属性及其与个体视图(即协作前特征)的内在关联——这一问题在现有研究中多被当作黑箱处理。我们提出名为CMiMC(面向协作感知的对比互信息最大化)的新型中间协作框架。CMiMC的核心思想是:通过最大化协作前后特征的互信息保留个体视图的判别性信息,同时通过最小化下游任务损失函数提升协作视图的有效性。具体而言,我们定义了适用于中间协作的多视角互信息(MVMI),该指标可同时从全局和局部尺度评估协作视图与个体视图的相关性。基于多视角对比学习构建CMiMNet,实现MVMI的估计与最大化,辅助体素级特征融合协作编码器的训练。在V2X-Sim 1.0数据集上的评估表明:CMiMC在0.5和0.7的IoU(交并比)阈值下,分别将现有最优方法的平均精度提升3.08%和4.44%。此外,CMiMC可将通信量压缩至1/32,同时保持与SOTA相当的性能。代码与附录已开源至https://github.com/77SWF/CMiMC。