Semantic map construction under bird's-eye view (BEV) plays an essential role in autonomous driving. In contrast to camera image, LiDAR provides the accurate 3D observations to project the captured 3D features onto BEV space inherently. However, the vanilla LiDAR-based BEV feature often contains many indefinite noises, where the spatial features have little texture and semantic cues. In this paper, we propose an effective LiDAR-based method to build semantic map. Specifically, we introduce a BEV pyramid feature decoder that learns the robust multi-scale BEV features for semantic map construction, which greatly boosts the accuracy of the LiDAR-based method. To mitigate the defects caused by lacking semantic cues in LiDAR data, we present an online Camera-to-LiDAR distillation scheme to facilitate the semantic learning from image to point cloud. Our distillation scheme consists of feature-level and logit-level distillation to absorb the semantic information from camera in BEV. The experimental results on challenging nuScenes dataset demonstrate the efficacy of our proposed LiDAR2Map on semantic map construction, which significantly outperforms the previous LiDAR-based methods over 27.9% mIoU and even performs better than the state-of-the-art camera-based approaches. Source code is available at: https://github.com/songw-zju/LiDAR2Map.
翻译:鸟瞰视角下的语义地图构建在自动驾驶中具有重要作用。与相机图像相比,激光雷达能够提供精确的三维观测信息,从而将采集的三维特征天然地投影到鸟瞰空间。然而,基于原始激光雷达的鸟瞰特征通常包含大量不确定噪声,其空间特征缺乏纹理和语义线索。本文提出一种有效的基于激光雷达的语义地图构建方法。具体而言,我们引入鸟瞰金字塔特征解码器,通过学习鲁棒的多尺度鸟瞰特征实现语义地图构建,显著提升了基于激光雷达方法的精度。为缓解激光雷达数据因缺乏语义线索导致的缺陷,我们提出在线相机-激光雷达蒸馏方案,促进图像到点云的语义学习。该蒸馏方案包含特征级和逻辑级蒸馏,用于从相机视角吸收鸟瞰空间的语义信息。在具有挑战性的nuScenes数据集上的实验结果表明,本文提出的LiDAR2Map在语义地图构建任务中展现出优异性能,其mIoU指标较现有基于激光雷达方法提升超过27.9%,甚至优于当前最先进的基于相机方法。源代码已开源:https://github.com/songw-zju/LiDAR2Map。