Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud Learning

Masked autoencoder has demonstrated its effectiveness in self-supervised point cloud learning. Considering that masking is a kind of corruption, in this work we explore a more general denoising autoencoder for point cloud learning (Point-DAE) by investigating more types of corruptions beyond masking. Specifically, we degrade the point cloud with certain corruptions as input, and learn an encoder-decoder model to reconstruct the original point cloud from its corrupted version. Three corruption families (\ie, density/masking, noise, and affine transformation) and a total of fourteen corruption types are investigated with traditional non-Transformer encoders. Besides the popular masking corruption, we identify another effective corruption family, \ie, affine transformation. The affine transformation disturbs all points globally, which is complementary to the masking corruption where some local regions are dropped. We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction. Extensive experiments on tasks of object classification, few-shot learning, robustness testing, part segmentation, and 3D object detection validate the effectiveness of the proposed method. The codes are available at \url{https://github.com/YBZh/Point-DAE}.

翻译：摘要：掩码自编码器已在自监督点云学习中展现出有效性。鉴于掩码是一种噪声损坏形式，本研究探索了更通用的点云去噪自编码器（Point-DAE），通过研究掩码之外更多类型的噪声损坏。具体而言，我们将点云退化至特定噪声损坏作为输入，学习一个编码器-解码器模型以从损坏版本重建原始点云。我们使用传统非Transformer编码器研究了三类损坏（即密度/掩码、噪声和仿射变换）共十四种具体类型。除广泛使用的掩码损坏外，我们发现仿射变换是另一类有效的损坏方式——该变换全局扰动所有点，与局部区域丢弃的掩码损坏形成互补。我们还在Transformer骨干网络中验证了仿射变换损坏的有效性，将完整点云的重建分解为局部细节补丁与全局粗略形状的重建，缓解了重建中的位置泄露问题。在目标分类、小样本学习、鲁棒性测试、部件分割与3D目标检测任务上的大量实验验证了所提方法的有效性。代码见\url{https://github.com/YBZh/Point-DAE}。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日