In the field of autonomous driving, 3D object detection is a very important perception module. Although the current SOTA algorithm combines Camera and Lidar sensors, limited by the high price of Lidar, the current mainstream landing schemes are pure Camera sensors or Camera+Radar sensors. In this study, we propose a new detection algorithm called HVDetFusion, which is a multi-modal detection algorithm that not only supports pure camera data as input for detection, but also can perform fusion input of radar data and camera data. The camera stream does not depend on the input of Radar data, thus addressing the downside of previous methods. In the pure camera stream, we modify the framework of Bevdet4D for better perception and more efficient inference, and this stream has the whole 3D detection output. Further, to incorporate the benefits of Radar signals, we use the prior information of different object positions to filter the false positive information of the original radar data, according to the positioning information and radial velocity information recorded by the radar sensors to supplement and fuse the BEV features generated by the original camera data, and the effect is further improved in the process of fusion training. Finally, HVDetFusion achieves the new state-of-the-art 67.4\% NDS on the challenging nuScenes test set among all camera-radar 3D object detectors. The code is available at https://github.com/HVXLab/HVDetFusion
翻译:在自动驾驶领域中,三维目标检测是一项至关重要的感知模块。尽管当前最先进的算法结合了相机与激光雷达传感器,但由于激光雷达价格高昂的限制,目前主流的落地方案为纯相机传感器或相机+雷达传感器。本研究提出了一种名为HVDetFusion的新型检测算法,这是一种多模态检测算法,不仅支持纯相机数据作为输入进行检测,还能实现雷达数据与相机数据的融合输入。相机数据流无需依赖雷达数据输入,从而克服了以往方法的缺陷。在纯相机数据流中,我们改进了Bevdet4D的框架以实现更优感知与更高效率的推理,该数据流可输出完整的3D检测结果。进一步地,为融合雷达信号的优势,我们利用不同物体位置的先验信息对原始雷达数据的假阳性信息进行过滤,并根据雷达传感器记录的定位信息与径向速度信息,对原始相机数据生成的BEV特征进行补充与融合,且在融合训练过程中效果进一步提升。最终,HVDetFusion在所有相机-雷达三维目标检测器中,在具有挑战性的nuScenes测试集上实现了67.4% NDS的最新最先进水平。代码已开源至https://github.com/HVXLab/HVDetFusion。