Bridging Language and Geometric Primitives for Zero-shot Point Cloud Segmentation

We investigate transductive zero-shot point cloud semantic segmentation, where the network is trained on seen objects and able to segment unseen objects. The 3D geometric elements are essential cues to imply a novel 3D object type. However, previous methods neglect the fine-grained relationship between the language and the 3D geometric elements. To this end, we propose a novel framework to learn the geometric primitives shared in seen and unseen categories' objects and employ a fine-grained alignment between language and the learned geometric primitives. Therefore, guided by language, the network recognizes the novel objects represented with geometric primitives. Specifically, we formulate a novel point visual representation, the similarity vector of the point's feature to the learnable prototypes, where the prototypes automatically encode geometric primitives via back-propagation. Besides, we propose a novel Unknown-aware InfoNCE Loss to fine-grained align the visual representation with language. Extensive experiments show that our method significantly outperforms other state-of-the-art methods in the harmonic mean-intersection-over-union (hIoU), with the improvement of 17.8\%, 30.4\%, 9.2\% and 7.9\% on S3DIS, ScanNet, SemanticKITTI and nuScenes datasets, respectively. Codes are available (https://github.com/runnanchen/Zero-Shot-Point-Cloud-Segmentation)

翻译：本文研究转导式零样本点云语义分割问题，即网络在已知类别物体上训练后，能够分割未知类别物体。三维几何元素是推断新型三维物体类型的关键线索，然而现有方法忽略了语言与三维几何元素之间的细粒度关联。为此，我们提出一种新型框架，用于学习已知与未知类别物体中共享的几何基元，并实现语言与所学几何基元的细粒度对齐。在语言引导下，网络能够识别由几何基元表征的新型物体。具体而言，我们提出基于点特征与可学习原型相似度向量的新型点视觉表征，其中原型通过反向传播自动编码几何基元。此外，我们提出未知感知的InfoNCE损失函数，以实现视觉表征与语言的细粒度对齐。大量实验表明，本方法在S3DIS、ScanNet、SemanticKITTI和nuScenes数据集上的调和平均交并比（hIoU）指标中分别提升17.8%、30.4%、9.2%和7.9%，显著超越现有最优方法。代码已开源（https://github.com/runnanchen/Zero-Shot-Point-Cloud-Segmentation）。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日