Semantic map construction under bird's-eye view (BEV) plays an essential role in autonomous driving. In contrast to camera image, LiDAR provides the accurate 3D observations to project the captured 3D features onto BEV space inherently. However, the vanilla LiDAR-based BEV feature often contains many indefinite noises, where the spatial features have little texture and semantic cues. In this paper, we propose an effective LiDAR-based method to build semantic map. Specifically, we introduce a BEV feature pyramid decoder that learns the robust multi-scale BEV features for semantic map construction, which greatly boosts the accuracy of the LiDAR-based method. To mitigate the defects caused by lacking semantic cues in LiDAR data, we present an online Camera-to-LiDAR distillation scheme to facilitate the semantic learning from image to point cloud. Our distillation scheme consists of feature-level and logit-level distillation to absorb the semantic information from camera in BEV. The experimental results on challenging nuScenes dataset demonstrate the efficacy of our proposed LiDAR2Map on semantic map construction, which significantly outperforms the previous LiDAR-based methods over 27.9% mIoU and even performs better than the state-of-the-art camera-based approaches. Source code is available at: https://github.com/songw-zju/LiDAR2Map.
翻译:摘要:鸟瞰图(BEV)下的语义地图构建在自动驾驶中扮演着关键角色。与相机图像相比,LiDAR能够提供精确的三维观测数据,将捕获的三维特征固有地投影到BEV空间。然而,基于LiDAR的原始BEV特征通常包含大量不确定噪声,其中空间特征缺乏纹理和语义线索。本文提出了一种有效的基于LiDAR的语义地图构建方法。具体而言,我们引入了一种BEV特征金字塔解码器,用于学习鲁棒的多尺度BEV特征以构建语义地图,这极大地提升了基于LiDAR方法的准确性。为缓解LiDAR数据因缺乏语义线索导致的缺陷,我们提出了一种在线相机到LiDAR蒸馏方案,以促进从图像到点云的语义学习。我们的蒸馏方案包括特征级和逻辑级蒸馏,用于从BEV中的相机吸收语义信息。在具有挑战性的nuScenes数据集上的实验结果表明,所提出的LiDAR2Map在语义地图构建方面具有有效性,其mIoU相比先前基于LiDAR的方法显著提升了27.9%以上,甚至优于最先进的基于相机的方法。源代码可在 https://github.com/songw-zju/LiDAR2Map 获取。