Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer

LiDAR odometry estimation and 3D semantic segmentation are crucial for autonomous driving, which has achieved remarkable advances recently. However, these tasks are challenging due to the imbalance of points in different semantic categories for 3D semantic segmentation and the influence of dynamic objects for LiDAR odometry estimation, which increases the importance of using representative/salient landmarks as reference points for robust feature learning. To address these challenges, we propose a saliency-guided approach that leverages attention information to improve the performance of LiDAR odometry estimation and semantic segmentation models. Unlike in the image domain, only a few studies have addressed point cloud saliency information due to the lack of annotated training data. To alleviate this, we first present a universal framework to transfer saliency distribution knowledge from color images to point clouds, and use this to construct a pseudo-saliency dataset (i.e. FordSaliency) for point clouds. Then, we adopt point cloud-based backbones to learn saliency distribution from pseudo-saliency labels, which is followed by our proposed SalLiDAR module. SalLiDAR is a saliency-guided 3D semantic segmentation model that integrates saliency information to improve segmentation performance. Finally, we introduce SalLONet, a self-supervised saliency-guided LiDAR odometry network that uses the semantic and saliency predictions of SalLiDAR to achieve better odometry estimation. Our extensive experiments on benchmark datasets demonstrate that the proposed SalLiDAR and SalLONet models achieve state-of-the-art performance against existing methods, highlighting the effectiveness of image-to-LiDAR saliency knowledge transfer. Source code will be available at https://github.com/nevrez/SalLONet.

翻译：激光雷达里程计估计与三维语义分割对于自动驾驶至关重要，近年来已取得显著进展。然而，这些任务面临诸多挑战：三维语义分割中不同语义类别的点分布不均衡，以及激光雷达里程计估计受动态物体影响，这凸显了使用代表性/显著地标作为鲁棒特征学习参考点的重要性。为应对这些挑战，我们提出一种显著性引导方法，利用注意力信息提升激光雷达里程计估计与语义分割模型的性能。与图像领域不同，由于缺乏标注训练数据，针对点云显著性的研究较少。为缓解此问题，我们首先提出一个通用框架，将显著性分布知识从彩色图像迁移至点云，并据此构建点云伪显著性数据集（即FordSaliency）。随后，我们采用基于点云的骨干网络从伪显著性标签中学习显著性分布，并接入我们提出的SalLiDAR模块。SalLiDAR是一种显著性引导的三维语义分割模型，通过融合显著性信息提升分割性能。最后，我们提出SalLONet——一种自监督的显著性引导激光雷达里程计网络，该网络利用SalLiDAR的语义与显著性预测实现更优的里程计估计。我们在基准数据集上的大量实验表明，所提出的SalLiDAR与SalLONet模型相较现有方法取得了最先进的性能，印证了图像到激光雷达显著性知识迁移的有效性。源代码将在https://github.com/nevrez/SalLONet 公开。