The fusion of multimodal sensor data streams such as camera images and lidar point clouds plays an important role in the operation of autonomous vehicles (AVs). Robust perception across a range of adverse weather and lighting conditions is specifically required for AVs to be deployed widely. While multi-sensor fusion networks have been previously developed for perception in sunny and clear weather conditions, these methods show a significant degradation in performance under night-time and poor weather conditions. In this paper, we propose a simple yet effective technique called ContextualFusion to incorporate the domain knowledge about cameras and lidars behaving differently across lighting and weather variations into 3D object detection models. Specifically, we design a Gated Convolutional Fusion (GatedConv) approach for the fusion of sensor streams based on the operational context. To aid in our evaluation, we use the open-source simulator CARLA to create a multimodal adverse-condition dataset called AdverseOp3D to address the shortcomings of existing datasets being biased towards daytime and good-weather conditions. Our ContextualFusion approach yields an mAP improvement of 6.2% over state-of-the-art methods on our context-balanced synthetic dataset. Finally, our method enhances state-of-the-art 3D objection performance at night on the real-world NuScenes dataset with a significant mAP improvement of 11.7%.
翻译:多模态传感器数据流(如摄像头图像和激光雷达点云)的融合在自动驾驶车辆(AVs)运行中扮演重要角色。为支持AVs大规模部署,需要其在各类恶劣天气与光照条件下具备稳健的感知能力。尽管现有针对晴朗天气环境开发的多传感器融合网络已取得进展,但这些方法在夜间及恶劣天气条件下性能显著下降。本文提出一种名为ContextualFusion的简洁有效技术,通过整合摄像头与激光雷达在不同光照和天气条件下表现差异的领域知识,将其融入三维目标检测模型。具体而言,我们设计了一种基于门控卷积融合(GatedConv)的方法,依据运行环境上下文实现传感器数据流的自适应融合。为辅助评估,我们利用开源仿真器CARLA构建了名为AdverseOp3D的多模态恶劣工况数据集,以弥补现有数据集偏向日间良好天气条件的不足。在我们构建的上下文均衡合成数据集上,ContextualFusion方法相较于现有最优方法的平均精度(mAP)提升6.2%。此外,该方法在真实世界NuScenes数据集上的夜间三维目标检测任务中实现了显著提升,平均精度(mAP)提高11.7%。