Accurately localizing 3D objects like pedestrians, cyclists, and other vehicles is essential in Autonomous Driving. To ensure high detection performance, Autonomous Vehicles complement RGB cameras with LiDAR sensors, but effectively combining these data sources for 3D object detection remains challenging. We propose LCF3D, a novel sensor fusion framework that combines a 2D object detector on RGB images with a 3D object detector on LiDAR point clouds. By leveraging multimodal fusion principles, we compensate for inaccuracies in the LiDAR object detection network. Our solution combines two key principles: (i) late fusion, to reduce LiDAR False Positives by matching LiDAR 3D detections with RGB 2D detections and filtering out unmatched LiDAR detections; and (ii) cascade fusion, to recover missed objects from LiDAR by generating new 3D frustum proposals corresponding to unmatched RGB detections. Experiments show that LCF3D is beneficial for domain generalization, as it turns out to be successful in handling different sensor configurations between training and testing domains. LCF3D achieves significant improvements over LiDAR-based methods, particularly for challenging categories like pedestrians and cyclists in the KITTI dataset, as well as motorcycles and bicycles in nuScenes. Code can be downloaded from: https://github.com/CarloSgaravatti/LCF3D.
翻译:在自动驾驶中,准确定位行人、骑行者及其他车辆等3D目标至关重要。为确保高检测性能,自动驾驶车辆通常将RGB相机与LiDAR传感器互补使用,但如何有效融合这些数据源进行3D目标检测仍具挑战性。本文提出LCF3D——一种新颖的传感器融合框架,它将RGB图像上的2D目标检测器与LiDAR点云上的3D目标检测器相结合。通过利用多模态融合原理,我们补偿了LiDAR目标检测网络的不准确性。我们的解决方案融合了两个关键原理:(i)后融合,通过将LiDAR 3D检测结果与RGB 2D检测结果进行匹配,并滤除未匹配的LiDAR检测,以减少LiDAR误报;(ii)级联融合,通过为未匹配的RGB检测生成新的3D视锥体提案,以恢复LiDAR漏检的目标。实验表明,LCF3D有助于提升领域泛化能力,因其能成功处理训练与测试领域间不同的传感器配置。在KITTI数据集中对行人、骑行者等挑战性类别,以及在nuScenes数据集中对摩托车、自行车类别,LCF3D相比基于LiDAR的方法均取得了显著提升。代码可从以下网址下载:https://github.com/CarloSgaravatti/LCF3D。