Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

The past few years have witnessed the great success and prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the field of 3D point cloud learning. Different from existing pre-training paradigms designed for deep point cloud feature extractors that fall into the scope of generative modeling or contrastive learning, this paper proposes a translative pre-training framework, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images. More specifically, we begin with deducing view-conditioned point-wise embeddings through the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which can be further fed into subsequent 2D convolutional translation heads for image generation. Extensive experimental evaluations on various downstream task scenarios demonstrate that our PointVST shows consistent and prominent performance superiority over current state-of-the-art approaches as well as satisfactory domain transfer capability. Our code will be publicly available at https://github.com/keeganhk/PointVST.

翻译：过去数年间，自监督表征学习在语言与2D视觉领域取得了巨大成功并得到广泛应用。然而，这类进展尚未完全迁移至3D点云学习领域。不同于现有面向深度点云特征提取器的预训练范式（其范畴涵盖生成式建模或对比学习），本文提出一种迁移式预训练框架PointVST，该框架基于一种新颖的自监督前置任务——将3D点云跨模态转换为对应的多样化2D渲染图像。具体而言，我们首先通过插入视点指示器推导出视点条件化的逐点嵌入，然后自适应聚合视点特定全局码字，该码字可进一步输入后续2D卷积转换头以生成图像。在各类下游任务场景上的广泛实验评估表明，我们的PointVST相较于当前最先进方法展现出持续且显著的性能优势，同时具备令人满意的领域迁移能力。我们的代码将公开发布于https://github.com/keeganhk/PointVST。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日