Radars and cameras belong to the most frequently used sensors for advanced driver assistance systems and automated driving research. However, there has been surprisingly little research on radar-camera fusion with neural networks. One of the reasons is a lack of large-scale automotive datasets with radar and unmasked camera data, with the exception of the nuScenes dataset. Another reason is the difficulty of effectively fusing the sparse radar point cloud on the bird's eye view (BEV) plane with the dense images on the perspective plane. The recent trend of camera-based 3D object detection using BEV features has enabled a new type of fusion, which is better suited for radars. In this work, we present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it can be incorporated into several state-of-the-art camera-based architectures. We show significant performance gains of up to 28% increase in the nuScenes detection score, which is an important step in radar-camera fusion research. Without tuning our model for the nuScenes benchmark, we achieve the best result among all published methods in the radar-camera fusion category.
翻译:雷达与相机是高级驾驶辅助系统及自动驾驶研究领域中最常用的传感器。然而,关于神经网络实现雷达-相机融合的研究却出人意料地匮乏。其原因之一在于缺乏兼具雷达与非屏蔽相机数据的大规模汽车数据集(nuScenes数据集除外),另一原因则在于鸟瞰平面上的稀疏雷达点云与透视平面上的密集图像难以实现有效融合。近年来,基于BEV特征的纯视觉3D目标检测技术催生出一种更适合雷达的新型融合方案。本文提出基于BEV平面的模块化雷达-相机融合网络RC-BEVFusion,并设计新型雷达编码分支BEVFeatureNet。实验表明,该分支可无缝集成至多种当前最优的相机感知架构中。我们在nuScenes检测评分上实现了高达28%的性能提升,这标志着雷达-相机融合研究的重要进展。即使未针对nuScenes基准进行模型调优,我们的方法仍在该数据集的雷达-相机融合类别中取得了所有已发布方法的最佳结果。