Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud Learning

Masked autoencoder has demonstrated its effectiveness in self-supervised point cloud learning. Considering that masking is a kind of corruption, in this work we explore a more general denoising autoencoder for point cloud learning (Point-DAE) by investigating more types of corruptions beyond masking. Specifically, we degrade the point cloud with certain corruptions as input, and learn an encoder-decoder model to reconstruct the original point cloud from its corrupted version. Three corruption families (\ie, density/masking, noise, and affine transformation) and a total of fourteen corruption types are investigated with traditional non-Transformer encoders. Besides the popular masking corruption, we identify another effective corruption family, \ie, affine transformation. The affine transformation disturbs all points globally, which is complementary to the masking corruption where some local regions are dropped. We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction. Extensive experiments on tasks of object classification, few-shot learning, robustness testing, part segmentation, and 3D object detection validate the effectiveness of the proposed method. The codes are available at \url{https://github.com/YBZh/Point-DAE}.

翻译：掩码自编码器已在自监督点云学习中证明了其有效性。考虑到掩码是一种损坏形式，本工作通过探索掩码之外的更多损坏类型，研究了一种更通用的用于点云学习的去噪自编码器（Point-DAE）。具体而言，我们使用特定损坏方式对点云进行降质作为输入，并学习一个编码器-解码器模型以从其损坏版本重建原始点云。本研究在传统非Transformer编码器上探索了三种损坏家族（即密度/掩码、噪声和仿射变换）共计十四种损坏类型。除了流行的掩码损坏外，我们识别出另一个有效的损坏家族，即仿射变换。仿射变换全局扰动所有点，这与掩码损坏（某些局部区域被丢弃）形成互补。我们还通过Transformer骨干网络验证了仿射变换损坏的有效性，其中将完整点云的重建分解为详细局部块和粗略全局形状的重建，缓解了重建中的位置泄漏问题。在物体分类、少样本学习、鲁棒性测试、部件分割和3D物体检测任务上的大量实验验证了所提方法的有效性。代码发布于\url{https://github.com/YBZh/Point-DAE}。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日