SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with 4D Imaging Radar

The 4D Millimeter wave (mmWave) radar is a promising technology for vehicle sensing due to its cost-effectiveness and operability in adverse weather conditions. However, the adoption of this technology has been hindered by sparsity and noise issues in radar point cloud data. This paper introduces spatial multi-representation fusion (SMURF), a novel approach to 3D object detection using a single 4D imaging radar. SMURF leverages multiple representations of radar detection points, including pillarization and density features of a multi-dimensional Gaussian mixture distribution through kernel density estimation (KDE). KDE effectively mitigates measurement inaccuracy caused by limited angular resolution and multi-path propagation of radar signals. Additionally, KDE helps alleviate point cloud sparsity by capturing density features. Experimental evaluations on View-of-Delft (VoD) and TJ4DRadSet datasets demonstrate the effectiveness and generalization ability of SMURF, outperforming recently proposed 4D imaging radar-based single-representation models. Moreover, while using 4D imaging radar only, SMURF still achieves comparable performance to the state-of-the-art 4D imaging radar and camera fusion-based method, with an increase of 1.22% in the mean average precision on bird's-eye view of TJ4DRadSet dataset and 1.32% in the 3D mean average precision on the entire annotated area of VoD dataset. Our proposed method demonstrates impressive inference time and addresses the challenges of real-time detection, with the inference time no more than 0.05 seconds for most scans on both datasets. This research highlights the benefits of 4D mmWave radar and is a strong benchmark for subsequent works regarding 3D object detection with 4D imaging radar.

翻译：4D毫米波雷达因其成本效益和在恶劣天气条件下的可操作性，成为一种极具潜力的车载感知技术。然而，雷达点云数据的稀疏性和噪声问题阻碍了该技术的广泛应用。本文提出了一种新颖的空间多表征融合方法（SMURF），利用单个4D成像雷达实现三维物体检测。SMURF通过核密度估计（KDE）充分利用雷达探测点的多种表征，包括多维度高斯混合分布的柱状化处理与密度特征。KDE有效缓解了由雷达信号角度分辨率有限和多径传播导致的测量误差，同时通过捕获密度特征改善了点云稀疏性问题。在View-of-Delft（VoD）和TJ4DRadSet数据集上的实验评估表明，SMURF具有出色的有效性和泛化能力，性能优于近期基于4D成像雷达的单表征模型。此外，在仅使用4D成像雷达的情况下，SMURF仍能达到与当前最先进的4D成像雷达与相机融合方法相当的性能：在TJ4DRadSet数据集的鸟瞰视角上平均精度均值提升1.22%，在VoD数据集完整标注区域的三维平均精度均值提升1.32%。本方法展现出卓越的推理速度，能有效应对实时检测挑战——在两个数据集的大多数扫描中推理时间不超过0.05秒。该研究揭示了4D毫米波雷达的显著优势，为后续基于4D成像雷达的三维物体检测研究树立了重要标杆。