Cooperative perception lets agents share information to expand coverage and improve scene understanding. However, in real-world scenarios, diverse and unpredictable corruptions undermine its robustness and generalization. To address these challenges, we introduce CoopDiff, a diffusion-based cooperative perception framework that mitigates corruptions via a denoising mechanism. CoopDiff adopts a teacher-student paradigm: the Quality-Aware Teacher performs voxel-level early fusion with Quality of Interest weighting and semantic guidance, then produces clean supervision features via a diffusion denoiser. The Dual-Branch Diffusion Student first separates ego and cooperative streams in encoding to reconstruct the teacher's clean targets. And then, an Ego-Guided Cross-Attention mechanism facilitates balanced decoding under degradation by adaptively integrating ego and cooperative features. We evaluate CoopDiff on two constructed multi-degradation benchmarks, OPV2Vn and DAIR-V2Xn, each incorporating six corruption types, including environmental and sensor-level distortions. Benefiting from the inherent denoising properties of diffusion, CoopDiff consistently outperforms prior methods across all degradation types and lowers the relative corruption error. Furthermore, it offers a tunable balance between precision and inference efficiency.
翻译:协同感知使智能体能够共享信息以扩大感知范围并提升场景理解能力。然而,在实际场景中,多样且不可预测的干扰会削弱其鲁棒性与泛化能力。为应对这些挑战,本文提出CoopDiff——一种基于扩散模型的协同感知框架,通过去噪机制有效缓解干扰影响。CoopDiff采用师生范式:质量感知教师模块通过兴趣质量加权与语义引导进行体素级早期融合,并利用扩散去噪器生成纯净的监督特征;双分支扩散学生模块首先在编码阶段分离自车与协同数据流以重建教师的纯净目标,随后通过自车引导交叉注意力机制,在退化场景下自适应融合自车与协同特征实现均衡解码。我们在构建的两个多重退化基准数据集OPV2Vn与DAIR-V2Xn上评估CoopDiff,每个数据集包含六类干扰类型(涵盖环境与传感器层面的失真)。得益于扩散模型固有的去噪特性,CoopDiff在所有退化类型中均持续超越现有方法,显著降低相对干扰误差。此外,该框架还提供了精度与推理效率之间的可调节平衡。