As an emerging technology and a relatively affordable device, the 4D imaging radar has already been confirmed effective in performing 3D object detection in autonomous driving. Nevertheless, the sparsity and noisiness of 4D radar point clouds hinder further performance improvement, and in-depth studies about its fusion with other modalities are lacking. On the other hand, most of the camera-based perception methods transform the extracted image perspective view features into the bird's-eye view geometrically via "depth-based splatting" proposed in Lift-Splat-Shoot (LSS), and some researchers exploit other modals such as LiDARs or ordinary automotive radars for enhancement. Recently, a few works have applied the "sampling" strategy for image view transformation, showing that it outperforms "splatting" even without image depth prediction. However, the potential of "sampling" is not fully unleashed. In this paper, we investigate the "sampling" view transformation strategy on the camera and 4D imaging radar fusion-based 3D object detection. In the proposed model, LXL, predicted image depth distribution maps and radar 3D occupancy grids are utilized to aid image view transformation, called "radar occupancy-assisted depth-based sampling". Experiments on VoD and TJ4DRadSet datasets show that the proposed method outperforms existing 3D object detection methods by a significant margin without bells and whistles. Ablation studies demonstrate that our method performs the best among different enhancement settings.
翻译:作为一种新兴技术且成本相对较低的设备,4D成像雷达已被证实可有效用于自动驾驶中的三维目标检测。然而,4D雷达点云的稀疏性与噪声特征制约了其性能的进一步提升,且目前缺乏关于该传感器与其他模态融合的深入研究。另一方面,绝大多数基于相机的感知方法通过Lift-Splat-Shoot(LSS)提出的"基于深度的溅射"策略,将提取的图像透视视图特征几何变换为鸟瞰视图。部分研究者利用激光雷达或普通车载雷达等其他传感器模态进行增强。近期,少量工作在图像视角变换中采用"采样"策略,表明即使不进行图像深度预测,该策略性能仍优于"溅射"方法。然而,"采样"策略的潜力尚未被充分挖掘。本文针对基于相机与4D成像雷达融合的三维目标检测任务,深入研究"采样"视角变换策略。在所提出的LXL模型中,我们利用预测的图像深度分布图与雷达三维占据栅格辅助图像视角变换,称为"雷达占据辅助的深度采样"方法。在VoD与TJ4DRadSet数据集上的实验表明,所提方法在不依赖复杂设计的情况下,以显著优势超越现有三维目标检测方法。消融实验证明,本方法在多种增强设置中均取得了最优性能。