In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
翻译:本文提出了一种解决自动驾驶感知系统中摄像头与雷达传感器融合以进行3D目标检测问题的新方法。该方法基于深度学习的最新进展,充分发挥两种传感器的优势以提升目标检测性能。具体而言,我们采用最先进的深度学习架构从摄像头图像中提取2D特征,随后应用一种创新性的跨域空间匹配(CDSM)变换方法将这些特征映射至3D空间。接着,通过互补融合策略将映射后的特征与提取的雷达数据进行融合,最终生成3D目标表征。为验证方法有效性,我们在NuScenes数据集上进行了评估。通过与单传感器性能及当前最优融合方法的对比,实验结果表明:所提方法在性能上优于单传感器方案,且能够与其他顶级融合方法直接竞争。