Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3x reduction in model parameters and 641x fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More).

翻译：尽管近年来3D LiDAR点云数据的可用性显著增长，但其标注仍然昂贵且耗时，这催生了自动驾驶等应用领域对半监督语义分割方法的需求。现有工作通常采用较大的分割主干网络以提高分割精度，却以计算成本为代价。此外，许多方法使用均匀采样来减少学习所需的地面真值数据，常导致次优性能。为解决这些问题，我们提出一种新流程，采用更小的架构，在需要更少地面真值标注的情况下，即可实现优于现有方法的分割精度。这一成果得益于一种新颖的稀疏深度可分离卷积模块，该模块在保持整体任务性能的同时，显著减少了网络参数数量。为有效对训练数据进行子采样，我们提出了一种新的时空冗余帧下采样方法，该方法利用传感器在环境中的运动信息，提取更具多样性的训练数据帧样本。为进一步利用有限标注数据样本，我们提出了一种基于LiDAR反射率的软伪标签方法。我们的方法在SemanticKITTI（标注5%数据时mIoU达59.5%）和ScribbleKITTI（标注5%数据时mIoU达58.1%）基准数据集上，以更少的标注数据超越了当代半监督工作，同时实现了模型参数减少2.3倍、乘法累加操作减少641倍，并在有限训练数据上展现出显著性能提升（即少即是多）。