We investigate transductive zero-shot point cloud semantic segmentation, where the network is trained on seen objects and able to segment unseen objects. The 3D geometric elements are essential cues to imply a novel 3D object type. However, previous methods neglect the fine-grained relationship between the language and the 3D geometric elements. To this end, we propose a novel framework to learn the geometric primitives shared in seen and unseen categories' objects and employ a fine-grained alignment between language and the learned geometric primitives. Therefore, guided by language, the network recognizes the novel objects represented with geometric primitives. Specifically, we formulate a novel point visual representation, the similarity vector of the point's feature to the learnable prototypes, where the prototypes automatically encode geometric primitives via back-propagation. Besides, we propose a novel Unknown-aware InfoNCE Loss to fine-grained align the visual representation with language. Extensive experiments show that our method significantly outperforms other state-of-the-art methods in the harmonic mean-intersection-over-union (hIoU), with the improvement of 17.8\%, 30.4\%, 9.2\% and 7.9\% on S3DIS, ScanNet, SemanticKITTI and nuScenes datasets, respectively. Codes are available (https://github.com/runnanchen/Zero-Shot-Point-Cloud-Segmentation)
翻译:本文研究转导式零样本点云语义分割问题,即网络在已知类别物体上训练后,能够分割未知类别物体。三维几何元素是推断新型三维物体类型的关键线索,然而现有方法忽略了语言与三维几何元素之间的细粒度关联。为此,我们提出一种新型框架,用于学习已知与未知类别物体中共享的几何基元,并实现语言与所学几何基元的细粒度对齐。在语言引导下,网络能够识别由几何基元表征的新型物体。具体而言,我们提出基于点特征与可学习原型相似度向量的新型点视觉表征,其中原型通过反向传播自动编码几何基元。此外,我们提出未知感知的InfoNCE损失函数,以实现视觉表征与语言的细粒度对齐。大量实验表明,本方法在S3DIS、ScanNet、SemanticKITTI和nuScenes数据集上的调和平均交并比(hIoU)指标中分别提升17.8%、30.4%、9.2%和7.9%,显著超越现有最优方法。代码已开源(https://github.com/runnanchen/Zero-Shot-Point-Cloud-Segmentation)。