Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to the challenging topic of learning from unlabeled or weaker forms of annotations. In this paper, we take the first attempt for fully unsupervised semantic segmentation of point clouds, which aims to delineate semantically meaningful objects without any form of annotations. Previous works of unsupervised pipeline on 2D images fails in this task of point clouds, due to: 1) Clustering Ambiguity caused by limited magnitude of data and imbalanced class distribution; 2) Irregularity Ambiguity caused by the irregular sparsity of point cloud. Therefore, we propose a novel framework, PointDC, which is comprised of two steps that handle the aforementioned problems respectively: Cross-Modal Distillation (CMD) and Super-Voxel Clustering (SVC). In the first stage of CMD, multi-view visual features are back-projected to the 3D space and aggregated to a unified point feature to distill the training of the point representation. In the second stage of SVC, the point features are aggregated to super-voxels and then fed to the iterative clustering process for excavating semantic classes. PointDC yields a significant improvement over the prior state-of-the-art unsupervised methods, on both the ScanNet-v2 (+18.4 mIoU) and S3DIS (+11.5 mIoU) semantic segmentation benchmarks.
翻译:点云的语义分割通常需要耗费大量人力进行标注,因此从无标签或弱标注数据中学习的挑战性课题备受关注。本文首次尝试对点云进行完全无监督语义分割,旨在无需任何形式标注的情况下划分出具有语义意义的物体。先前基于二维图像的的无监督方法在点云任务中失效,其原因在于:1) 数据规模有限和类别分布不均衡导致的聚类歧义性;2) 点云不规则稀疏性导致的非规则歧义性。为此,我们提出新型框架PointDC,包含分别处理上述问题的两个步骤:跨模态蒸馏(CMD)与超体素聚类(SVC)。在CMD第一阶段,多视角视觉特征被反投影至三维空间并聚合为统一点特征,以蒸馏点云表征的训练过程。在SVC第二阶段,点特征被聚合为超体素,随后输入迭代聚类过程以挖掘语义类别。PointDC在ScanNet-v2(+18.4 mIoU)和S3DIS(+11.5 mIoU)语义分割基准上均显著超越先前最先进的无监督方法。