Radars and cameras belong to the most frequently used sensors for advanced driver assistance systems and automated driving research. However, there has been surprisingly little research on radar-camera fusion with neural networks. One of the reasons is a lack of large-scale automotive datasets with radar and unmasked camera data, with the exception of the nuScenes dataset. Another reason is the difficulty of effectively fusing the sparse radar point cloud on the bird's eye view (BEV) plane with the dense images on the perspective plane. The recent trend of camera-based 3D object detection using BEV features has enabled a new type of fusion, which is better suited for radars. In this work, we present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it can be incorporated into several state-of-the-art camera-based architectures. We show significant performance gains of up to 28% increase in the nuScenes detection score, which is an important step in radar-camera fusion research. Without tuning our model for the nuScenes benchmark, we achieve the best result among all published methods in the radar-camera fusion category.
翻译:雷达与相机是高级驾驶辅助系统和自动驾驶研究中使用最为频繁的传感器。然而,关于雷达-相机融合与神经网络的结合,目前的研究却出人意料地少。原因之一在于缺乏大规模车载数据集(包含雷达和未遮罩的相机数据),除nuScenes数据集外尤为如此;另一原因则是在鸟瞰图平面上稀疏的雷达点云与透视平面上密集图像的有效融合存在困难。近期基于鸟瞰图特征的相机3D目标检测趋势催生了一种新型融合方法,其更适合雷达应用。本文提出了RC-BEVFusion,一种在鸟瞰图平面上实现的模块化雷达-相机融合网络。我们设计了新颖的雷达编码器分支BEVFeatureNet,并证明其可集成至多种最先进的基于相机的网络架构中。实验显示,nuScenes检测分数最高提升28%,这标志着雷达-相机融合研究的重要进展。在未针对nuScenes基准进行调参的情况下,我们在雷达-相机融合类别所有已公开方法中取得了最优结果。