4D millimeter-wave (mmWave) radar has been widely adopted in autonomous driving and robot perception due to its low cost and all-weather robustness. However, point-cloud-based radar representations suffer from information loss due to multi-stage signal processing, while directly utilizing raw 4D radar tensors incurs prohibitive computational costs. To address these challenges, we propose WRCFormer, a novel 3D object detection framework that efficiently fuses raw 4D radar cubes with camera images via decoupled multi-view radar representations. Our approach introduces two key components: (1) A Wavelet Attention Module embedded in a wavelet-based Feature Pyramid Network (FPN), which enhances the representation of sparse radar signals and image data by capturing joint spatial-frequency features, thereby mitigating information loss while maintaining computational efficiency. (2) A Geometry-guided Progressive Fusion mechanism, a two-stage query-based fusion strategy that progressively aligns multi-view radar and visual features through geometric priors, enabling modality-agnostic and efficient integration without overwhelming computational overhead. Extensive experiments on the K-Radar benchmark show that WRCFormer achieves state-of-the-art performance, surpassing the best existing model by approximately 2.4% in all scenarios and 1.6% in sleet conditions, demonstrating strong robustness in adverse weather.
翻译:4D毫米波雷达因其低成本与全天候鲁棒性,已广泛应用于自动驾驶与机器人感知领域。然而,基于点云的雷达表征因多级信号处理而存在信息损失,而直接使用原始4D雷达张量则会产生极高的计算成本。为应对这些挑战,我们提出WRCFormer——一种通过解耦多视角雷达表征高效融合原始4D雷达立方体与相机图像的新型3D目标检测框架。该方法包含两个核心组件:(1)嵌入基于小波的特征金字塔网络(FPN)中的小波注意力模块,通过捕获联合空频特征来增强稀疏雷达信号与图像数据的表征,从而在保持计算效率的同时缓解信息损失。(2)几何引导渐进融合机制——一种基于查询的两阶段融合策略,通过几何先验逐步对齐多视角雷达与视觉特征,实现模态无关的高效集成且无需过高计算开销。在K-Radar基准上的大量实验表明,WRCFormer取得了最先进的性能,在所有场景中超越现有最佳模型约2.4%,在雨雪条件下提升约1.6%,展现出恶劣天气下的强鲁棒性。