Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation

In this work, we address the challenging task of few-shot and zero-shot 3D point cloud semantic segmentation. The success of few-shot semantic segmentation in 2D computer vision is mainly driven by the pre-training on large-scale datasets like imagenet. The feature extractor pre-trained on large-scale 2D datasets greatly helps the 2D few-shot learning. However, the development of 3D deep learning is hindered by the limited volume and instance modality of datasets due to the significant cost of 3D data collection and annotation. This results in less representative features and large intra-class feature variation for few-shot 3D point cloud segmentation. As a consequence, directly extending existing popular prototypical methods of 2D few-shot classification/segmentation into 3D point cloud segmentation won't work as well as in 2D domain. To address this issue, we propose a Query-Guided Prototype Adaption (QGPA) module to adapt the prototype from support point clouds feature space to query point clouds feature space. With such prototype adaption, we greatly alleviate the issue of large feature intra-class variation in point cloud and significantly improve the performance of few-shot 3D segmentation. Besides, to enhance the representation of prototypes, we introduce a Self-Reconstruction (SR) module that enables prototype to reconstruct the support mask as well as possible. Moreover, we further consider zero-shot 3D point cloud semantic segmentation where there is no support sample. To this end, we introduce category words as semantic information and propose a semantic-visual projection model to bridge the semantic and visual spaces. Our proposed method surpasses state-of-the-art algorithms by a considerable 7.90% and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks, respectively. Code is available at https://github.com/heshuting555/PAP-FZS3D.

翻译：本文针对小样本和零样本三维点云语义分割这一具有挑战性的任务展开研究。二维计算机视觉中小样本语义分割的成功主要得益于在大规模数据集（如ImageNet）上的预训练。基于大规模二维数据集预训练的特征提取器极大促进了二维小样本学习。然而，由于三维数据采集与标注成本高昂，三维深度学习的发展受到数据集规模有限及实例模态单一的制约，这导致三维点云小样本分割存在特征代表性不足以及类内特征差异显著的问题。因此，直接将现有二维小样本分类/分割的典型原型方法扩展到三维点云分割中，其效果远不及在二维领域中的表现。针对此问题，我们提出查询引导的原型自适应（QGPA）模块，将支持点云特征空间中的原型自适应地迁移至查询点云特征空间。通过这种原型自适应，我们大幅缓解了点云中类内特征差异大的问题，显著提升了小样本三维分割性能。此外，为增强原型表征能力，我们引入自重建（SR）模块，使原型能够尽可能完整地重建支持掩码。同时，我们进一步考虑了无支持样本的零样本三维点云语义分割场景。为此，我们引入类别词汇作为语义信息，并提出语义-视觉投影模型以桥接语义空间与视觉空间。在S3DIS和ScanNet基准数据集上，采用2-way 1-shot设置时，我们的方法分别以7.90%和14.82%的显著优势超越现有最先进算法。代码已开源：https://github.com/heshuting555/PAP-FZS3D。