See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data

Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase. Recent trends favor the pipeline which transfers knowledge from seen classes with labels to unseen classes without labels. They typically align visual features with semantic features obtained from word embedding by the supervision of seen classes' annotations. However, point cloud contains limited information to fully match with semantic features. In fact, the rich appearance information of images is a natural complement to the textureless point cloud, which is not well explored in previous literature. Motivated by this, we propose a novel multi-modal zero-shot learning method to better utilize the complementary information of point clouds and images for more accurate visual-semantic alignment. Extensive experiments are performed in two popular benchmarks, i.e., SemanticKITTI and nuScenes, and our method outperforms current SOTA methods with 52% and 49% improvement on average for unseen class mIoU, respectively.

翻译：零样本点云分割旨在使深度模型能够识别训练阶段未见过的点云中的新物体。当前趋势倾向于采用一种流程，即通过带标签的已知类别知识迁移至无标签的未知类别。这些方法通常利用已知类别注释的监督，将视觉特征与从词嵌入中获取的语义特征进行对齐。然而，点云包含的有限信息难以与语义特征完全匹配。实际上，图像丰富的表观信息可作为无纹理点云的自然补充，而此前文献对此探索不足。受此启发，我们提出一种新型多模态零样本学习方法，以更好地利用点云与图像的互补信息，实现更精准的视觉-语义对齐。我们在两个主流基准数据集（即SemanticKITTI和nuScenes）上进行了大量实验，所提方法在未知类别的平均交并比上分别较现有最先进方法提升52%和49%。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【KDD 2020】M2GRL: 一个多任务多视角图表示学习框架的Web-scale的推荐系统，M2GRL: A Multi-task Multi-view Graph Representation Learning Framework for Web-scale Recommender Systems

专知会员服务

29+阅读 · 2020年6月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日