Cameras and LiDARs are both important sensors for autonomous driving, playing critical roles for 3D object detection. Camera-LiDAR Fusion has been a prevalent solution for robust and accurate autonomous driving perception. In contrast to the vast majority of existing arts that focus on how to improve the performance of 3D target detection through cross-modal schemes, deep learning algorithms, and training tricks, we devote attention to the impact of sensor configurations on the performance of learning-based methods. To achieve this, we propose a unified information-theoretic surrogate metric for camera and LiDAR evaluation based on the proposed sensor perception model. We also design an accelerated high-quality framework for data acquisition, model training, and performance evaluation that functions with the CARLA simulator. To show the correlation between detection performance and our surrogate metrics, We conduct experiments using several camera-LiDAR placements and parameters inspired by self-driving companies and research institutions. Extensive experimental results of representative algorithms on NuScenes dataset validate the effectiveness of our surrogate metric, demonstrating that sensor configurations significantly impact point-cloud-image fusion based detection models, which contribute up to 30% discrepancy in terms of average precision.
翻译:摄像头与激光雷达是自动驾驶中的重要传感器,对于三维目标检测起着关键作用。摄像头-激光雷达融合已成为实现稳健且精确的自动驾驶感知的主流方案。与现有绝大多数研究聚焦于通过跨模态方案、深度学习算法及训练技巧提升三维目标检测性能不同,我们关注传感器配置对基于学习方法性能的影响。为此,我们基于所提出的传感器感知模型,提出了一种统一的信息论代理指标用于评估摄像头与激光雷达。我们还设计了一个加速的高质量框架,用于数据采集、模型训练及性能评估,该框架在CARLA仿真器中运行。为展示检测性能与代理指标之间的相关性,我们借鉴自动驾驶公司及研究机构的方案,采用多种摄像头-激光雷达布局与参数进行实验。在NuScenes数据集上对代表性算法的大量实验验证了代理指标的有效性,结果表明传感器配置显著影响基于点云-图像融合的检测模型,其平均精度差异可达30%。