Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer

LiDAR odometry estimation and 3D semantic segmentation are crucial for autonomous driving, which has achieved remarkable advances recently. However, these tasks are challenging due to the imbalance of points in different semantic categories for 3D semantic segmentation and the influence of dynamic objects for LiDAR odometry estimation, which increases the importance of using representative/salient landmarks as reference points for robust feature learning. To address these challenges, we propose a saliency-guided approach that leverages attention information to improve the performance of LiDAR odometry estimation and semantic segmentation models. Unlike in the image domain, only a few studies have addressed point cloud saliency information due to the lack of annotated training data. To alleviate this, we first present a universal framework to transfer saliency distribution knowledge from color images to point clouds, and use this to construct a pseudo-saliency dataset (i.e. FordSaliency) for point clouds. Then, we adopt point cloud-based backbones to learn saliency distribution from pseudo-saliency labels, which is followed by our proposed SalLiDAR module. SalLiDAR is a saliency-guided 3D semantic segmentation model that integrates saliency information to improve segmentation performance. Finally, we introduce SalLONet, a self-supervised saliency-guided LiDAR odometry network that uses the semantic and saliency predictions of SalLiDAR to achieve better odometry estimation. Our extensive experiments on benchmark datasets demonstrate that the proposed SalLiDAR and SalLONet models achieve state-of-the-art performance against existing methods, highlighting the effectiveness of image-to-LiDAR saliency knowledge transfer. Source code will be available at https://github.com/nevrez/SalLONet.

翻译：激光雷达里程计估计和三维语义分割对于自动驾驶至关重要，近年来已取得显著进展。然而，由于三维语义分割中不同语义类别的点云分布不平衡，以及激光雷达里程计估计中动态物体的影响，这些任务仍具有挑战性，因此使用具有代表性/显著性的地标作为参考点进行鲁棒特征学习的重要性日益凸显。为解决这些问题，我们提出了一种显著性引导方法，利用注意力信息提升激光雷达里程计估计和语义分割模型的性能。与图像领域不同，由于缺乏标注训练数据，点云显著性信息的研究相对较少。为此，我们首先提出一个通用框架，将色彩图像中的显著性分布知识迁移至点云，并利用该框架构建点云伪显著性数据集（即FordSaliency）。随后，我们采用基于点云的主干网络从伪显著性标签中学习显著性分布，并在此基础上提出SalLiDAR模块。SalLiDAR是一个显著性引导的三维语义分割模型，通过整合显著性信息提升分割性能。最后，我们引入SalLONet，这是一个自监督的显著性引导激光雷达里程计网络，利用SalLiDAR的语义和显著性预测实现更优的里程计估计。在基准数据集上的大量实验表明，所提出的SalLiDAR和SalLONet模型相较于现有方法达到了最先进的性能，凸显了图像至激光雷达显著性知识迁移的有效性。源代码将发布于https://github.com/nevrez/SalLONet。