Masked Autoencoders in 3D Point Cloud Representation Learning

Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, self-supervised learning based upon masking local surface patches for 3D point cloud data has been under-explored. In this paper, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches and complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB\_T50\_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods ($93.4\%$ and $86.2\%$ classification accuracy, respectively).

翻译：基于Transformer的自监督表示学习方法从无标签数据集中学习通用特征，为下游任务提供有效的网络初始化参数。然而，针对三维点云数据的局部面片掩码自监督学习研究尚不充分。本文提出了一种用于三维点云表示学习的掩码自编码器(简称MAE3D)——一种新型的自监督学习自编码范式。我们首先将输入点云分割成面片并掩码其中一部分，随后使用面片嵌入模块提取未掩码面片的特征。其次，我们采用基于面片的MAE3D Transformer来学习点云面片的局部特征以及面片间的高层上下文关系，并补全被掩码面片的潜在表示。通过结合多任务损失的点云重建模块，我们最终完成残缺点云的补全。我们在ShapeNet55上以点云补全预文本任务进行自监督预训练，并在ModelNet40和ScanObjectNN(最困难变体PB\_T50\_RS)上微调预训练模型。综合实验表明，我们的MAE3D从点云面片中提取的局部特征对下游分类任务具有显著优势，分别以93.4%和86.2%的分类准确率大幅超越当前最优方法。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日