Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.
翻译:将激光雷达与相机信息融合至鸟瞰图(BEV)表征已成为自动驾驶中三维目标检测的关键环节。然而,现有方法易受激光雷达与相机传感器间标定关系不准确的影响。此类误差会导致相机分支的深度估计产生偏差,最终引发激光雷达与相机BEV特征间的错位。本研究提出一种名为GraphBEV的鲁棒融合框架。针对点云投影不准确导致的误差,我们设计了局部对齐模块,通过图匹配机制利用邻域感知深度特征。此外,我们提出全局对齐模块以校正激光雷达与相机BEV特征间的错位。GraphBEV框架在nuScenes验证集上实现了70.1%的平均精度(mAP),较BEV Fusion提升1.6%,达到最先进性能。值得注意的是,在存在错位噪声的条件下,GraphBEV以8.3%的优势超越BEV Fusion。