Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.
翻译:将激光雷达与摄像头信息融合至鸟瞰图(BEV)表示已成为自动驾驶中3D目标检测的关键问题。然而,现有方法易受激光雷达与摄像头传感器间标定关系不准确的影响,导致摄像头分支深度估计产生误差,最终引发激光雷达与摄像头BEV特征的对齐偏差。本文提出一种名为Graph BEV的鲁棒融合框架。针对点云投影不准确引起的误差,我们引入局部对齐模块,通过图匹配利用邻域感知深度特征。此外,我们提出全局对齐模块来修正激光雷达与摄像头BEV特征间的错位。我们的Graph BEV框架在nuScenes验证集上实现了最先进的性能,mAP达70.1%,相较BEV Fusion提升1.6%。重要的是,在存在对齐噪声的条件下,Graph BEV表现优于BEV Fusion达8.3%。