SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with 4D Imaging Radar

The 4D Millimeter wave (mmWave) radar is a promising technology for vehicle sensing due to its cost-effectiveness and operability in adverse weather conditions. However, the adoption of this technology has been hindered by sparsity and noise issues in radar point cloud data. This paper introduces spatial multi-representation fusion (SMURF), a novel approach to 3D object detection using a single 4D imaging radar. SMURF leverages multiple representations of radar detection points, including pillarization and density features of a multi-dimensional Gaussian mixture distribution through kernel density estimation (KDE). KDE effectively mitigates measurement inaccuracy caused by limited angular resolution and multi-path propagation of radar signals. Additionally, KDE helps alleviate point cloud sparsity by capturing density features. Experimental evaluations on View-of-Delft (VoD) and TJ4DRadSet datasets demonstrate the effectiveness and generalization ability of SMURF, outperforming recently proposed 4D imaging radar-based single-representation models. Moreover, while using 4D imaging radar only, SMURF still achieves comparable performance to the state-of-the-art 4D imaging radar and camera fusion-based method, with an increase of 1.22% in the mean average precision on bird's-eye view of TJ4DRadSet dataset and 1.32% in the 3D mean average precision on the entire annotated area of VoD dataset. Our proposed method demonstrates impressive inference time and addresses the challenges of real-time detection, with the inference time no more than 0.05 seconds for most scans on both datasets. This research highlights the benefits of 4D mmWave radar and is a strong benchmark for subsequent works regarding 3D object detection with 4D imaging radar.

翻译：摘要：4D毫米波雷达凭借其成本效益和恶劣天气下的可操作性，成为极具前景的车辆感知技术。然而，雷达点云数据的稀疏性与噪声问题限制了该技术的实际应用。本文提出空间多表示融合（SMURF）方法——一种基于单颗4D成像雷达进行三维目标检测的创新方案。SMURF利用雷达探测点的多重表示方法，包括通过核密度估计（KDE）对多维高斯混合分布进行柱状化处理与密度特征提取。KDE有效抑制了有限角分辨率及雷达信号多径传播导致的测量误差，同时通过捕捉密度特征缓解点云稀疏性问题。在View-of-Delft（VoD）与TJ4DRadSet数据集上的实验表明，SMURF具备卓越的有效性与泛化能力，性能超越近期提出的基于4D成像雷达的单表示模型。值得注意的是，在仅使用4D成像雷达的条件下，SMURF仍能达到与最先进的4D成像雷达-相机融合方法相媲美的性能：在TJ4DRadSet数据集的鸟瞰视角下平均精度提升1.22%，在VoD数据集全标注区域的3D平均精度提升1.32%。本文方法展现出优异的推理速度，在两个数据集的多数扫描中推理时间不超过0.05秒，有效解决了实时检测难题。本研究凸显了4D毫米波雷达的技术优势，为后续基于4D成像雷达的三维目标检测研究树立了重要基准。