Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to the challenging topic of learning from unlabeled or weaker forms of annotations. In this paper, we take the first attempt for fully unsupervised semantic segmentation of point clouds, which aims to delineate semantically meaningful objects without any form of annotations. Previous works of unsupervised pipeline on 2D images fails in this task of point clouds, due to: 1) Clustering Ambiguity caused by limited magnitude of data and imbalanced class distribution; 2) Irregularity Ambiguity caused by the irregular sparsity of point cloud. Therefore, we propose a novel framework, PointDC, which is comprised of two steps that handle the aforementioned problems respectively: Cross-Modal Distillation (CMD) and Super-Voxel Clustering (SVC). In the first stage of CMD, multi-view visual features are back-projected to the 3D space and aggregated to a unified point feature to distill the training of the point representation. In the second stage of SVC, the point features are aggregated to super-voxels and then fed to the iterative clustering process for excavating semantic classes. PointDC yields a significant improvement over the prior state-of-the-art unsupervised methods, on both the ScanNet-v2 (+18.4 mIoU) and S3DIS (+11.5 mIoU) semantic segmentation benchmarks.
翻译:点云的语义分割通常需要耗费大量人力进行标注,因此从无标注或弱标注形式中学习的挑战性课题吸引了广泛关注。本文首次尝试完全无监督的点云语义分割,旨在无需任何形式的标注即可描绘具有语义意义的物体。先前基于二维图像的无人监督流水线方法在点云任务中失效,原因在于:1)数据规模有限和类别分布不平衡导致的聚类模糊性;2)点云不规则稀疏性导致的非规则模糊性。为此,我们提出名为PointDC的新型框架,通过两个步骤分别解决上述问题:跨模态蒸馏(CMD)和超体素聚类(SVC)。在CMD第一阶段,多视角视觉特征被反向投影到三维空间并聚合为统一的点特征,以蒸馏训练点云表示;在SVC第二阶段,点特征被聚合为超体素,随后输入迭代聚类过程以挖掘语义类别。PointDC在ScanNet-v2(+18.4 mIoU)和S3DIS(+11.5 mIoU)语义分割基准上相较先前最先进的无监督方法取得了显著提升。