Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to the challenging topic of learning from unlabeled or weaker forms of annotations. In this paper, we take the first attempt for fully unsupervised semantic segmentation of point clouds, which aims to delineate semantically meaningful objects without any form of annotations. Previous works of unsupervised pipeline on 2D images fails in this task of point clouds, due to: 1) Clustering Ambiguity caused by limited magnitude of data and imbalanced class distribution; 2) Irregularity Ambiguity caused by the irregular sparsity of point cloud. Therefore, we propose a novel framework, PointDC, which is comprised of two steps that handle the aforementioned problems respectively: Cross-Modal Distillation (CMD) and Super-Voxel Clustering (SVC). In the first stage of CMD, multi-view visual features are back-projected to the 3D space and aggregated to a unified point feature to distill the training of the point representation. In the second stage of SVC, the point features are aggregated to super-voxels and then fed to the iterative clustering process for excavating semantic classes. PointDC yields a significant improvement over the prior state-of-the-art unsupervised methods, on both the ScanNet-v2 (+18.4 mIoU) and S3DIS (+11.5 mIoU) semantic segmentation benchmarks.
翻译:点云的语义分割通常需要耗费大量人力进行标注,因此从无标注或弱标注形式中学习这一挑战性课题受到广泛关注。本文首次尝试对点云进行完全无监督语义分割,旨在无需任何形式标注的情况下勾勒出具有语义意义的物体。由于以下原因,先前在二维图像上使用的无监督流水线方法无法适用于点云任务:1)数据规模有限与类别分布不均衡导致的聚类模糊性;2)点云不规则稀疏性导致的非规则模糊性。为此,我们提出一种新型框架PointDC,该框架包含两个步骤,分别处理上述问题:跨模态蒸馏(CMD)与超体素聚类(SVC)。在CMD的第一阶段,多视角视觉特征被反向投影至三维空间并聚合为统一的点特征,以蒸馏点表示的训练过程。在SVC的第二阶段,点特征被聚合为超体素,随后输入迭代聚类过程以挖掘语义类别。PointDC在ScanNet-v2(+18.4 mIoU)和S3DIS(+11.5 mIoU)语义分割基准测试中,相较于先前最先进的无监督方法取得了显著提升。