LiDAR semantic segmentation plays a crucial role in enabling autonomous driving and robots to understand their surroundings accurately and robustly. There are different types of methods, such as point-based, range image-based, and polar-based. Among these, range image-based methods are widely used due to their balance between accuracy and speed. However, they face a significant challenge known as the ``many-to-one'' problem caused by the range image's limited horizontal and vertical angular resolution, where around 20% of the 3D points are occluded during model inference based on our observation. In this paper, we present TFNet, a range image-based LiDAR semantic segmentation method that utilizes temporal information to address this issue. Specifically, we incorporate a temporal fusion layer to extract useful information from previous scans and integrate it with the current scan. We then design a max-voting-based post-processing technique to correct false predictions, particularly those caused by the ``many-to-one'' issue. Experiments on two benchmarks and seven backbones of three modalities demonstrate the effectiveness and scalability of our proposed method.
翻译:激光雷达语义分割在使自动驾驶和机器人能够准确且稳健地理解其周围环境方面起着关键作用。现有多种类型的方法,例如基于点云的方法、基于距离图像的方法和基于极坐标的方法。其中,基于距离图像的方法因其在准确性和速度之间的平衡而被广泛使用。然而,它们面临一个重大挑战,即由距离图像有限的水平和垂直角分辨率导致的“多对一”问题。根据我们的观察,在模型推理过程中,约有20%的3D点被遮挡。本文提出了一种基于距离图像的激光雷达语义分割方法TFNet,该方法利用时间信息来解决这一问题。具体而言,我们引入了一个时间融合层,以从先前扫描中提取有用信息并将其与当前扫描集成。随后,我们设计了一种基于最大投票的后处理技术,用于修正错误预测,特别是由“多对一”问题导致的预测错误。在两个基准数据集及三种模态的七个主干网络上的实验证明了我们方法的有效性和可扩展性。