Cooperative perception has attracted wide attention given its capability to leverage shared information across connected automated vehicles (CAVs) and smart infrastructures to address sensing occlusion and range limitation issues. However, existing research overlooks the fragile multi-sensor correlations in multi-agent settings, as the heterogeneous agent sensor measurements are highly susceptible to environmental factors, leading to weakened inter-agent sensor interactions. The varying operational conditions and other real-world factors inevitably introduce multifactorial noise and consequentially lead to multi-sensor misalignment, making the deployment of multi-agent multi-modality perception particularly challenging in the real world. In this paper, we propose AgentAlign, a real-world heterogeneous agent cross-modality feature alignment framework, to effectively address these multi-modality misalignment issues. Our method introduces a cross-modality feature alignment space (CFAS) and heterogeneous agent feature alignment (HAFA) mechanism to harmonize multi-modality features across various agents dynamically. Additionally, we present a novel V2XSet-noise dataset that simulates realistic sensor imperfections under diverse environmental conditions, facilitating a systematic evaluation of our approach's robustness. Extensive experiments on the V2X-Real and V2XSet-Noise benchmarks demonstrate that our framework achieves state-of-the-art performance, underscoring its potential for real-world applications in cooperative autonomous driving. The controllable V2XSet-Noise dataset and generation pipeline will be released in the future.
翻译:协同感知因其能够利用互联自动驾驶车辆(CAV)与智能基础设施间的共享信息以应对感知遮挡与距离限制问题而受到广泛关注。然而,现有研究忽视了多智能体场景中脆弱的多传感器关联性——异构智能体的传感器测量值极易受环境因素影响,导致智能体间传感器交互作用减弱。多变的运行条件及其他现实因素不可避免地引入多源噪声,进而导致多传感器错位,使得多智能体多模态感知系统在实际部署中面临严峻挑战。本文提出AgentAlign,一种面向真实场景的异构智能体跨模态特征对齐框架,以有效解决此类多模态错位问题。该方法引入跨模态特征对齐空间(CFAS)与异构智能体特征对齐(HAFA)机制,动态协调不同智能体间的多模态特征。此外,我们构建了新颖的V2XSet-noise数据集,模拟多样化环境条件下的真实传感器缺陷,为系统评估本方法的鲁棒性提供支持。在V2X-Real与V2XSet-Noise基准上的大量实验表明,本框架实现了最先进的性能,彰显了其在协同自动驾驶实际应用中的潜力。可控的V2XSet-Noise数据集及生成流程将于未来开源。