GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations? To answer that, we introduce a point cloud representation learning framework, based on geometric feature reconstruction. In contrast to recent papers that directly adopt masked autoencoder (MAE) and only predict original coordinates or occupancy from masked point clouds, our method revisits differences between images and point clouds and identifies three self-supervised learning objectives peculiar to point clouds, namely centroid prediction, normal estimation, and curvature prediction. Combined with occupancy prediction, these four objectives yield an nontrivial self-supervised learning task and mutually facilitate models to better reason fine-grained geometry of point clouds. Our pipeline is conceptually simple and it consists of two major steps: first, it randomly masks out groups of points, followed by a Transformer-based point cloud encoder; second, a lightweight Transformer decoder predicts centroid, normal, and curvature for points in each voxel. We transfer the pre-trained Transformer encoder to a downstream peception model. On the nuScene Datset, our model achieves 3.38 mAP improvment for object detection, 2.1 mIoU gain for segmentation, and 1.7 AMOTA gain for multi-object tracking. We also conduct experiments on the Waymo Open Dataset and achieve significant performance improvements over baselines as well.

翻译：本文试图解决点云自监督学习中的一个基本问题：在没有标注的情况下，应利用何种信号从点云中学习特征？为此，我们提出了一种基于几何特征重建的点云表示学习框架。与近期直接采用掩蔽自编码器（MAE）并仅从掩蔽点云中预测原始坐标或占有率的论文不同，我们的方法重新审视了图像与点云之间的差异，并确定了三种点云特有的自监督学习目标，即质心预测、法线估计和曲率预测。结合占有率预测，这四个目标构成了一项非平凡的自监督学习任务，并相互促进模型更好地推理点云的细粒度几何结构。我们的流程在概念上简单明了，包含两个主要步骤：首先，随机掩蔽点组，随后使用基于Transformer的点云编码器；其次，一个轻量级Transformer解码器预测每个体素内点的质心、法线和曲率。我们将预训练的Transformer编码器迁移至下游感知模型。在nuScene数据集上，我们的模型在目标检测中实现了3.38 mAP的提升，在分割中实现了2.1 mIoU的提升，在多目标跟踪中实现了1.7 AMOTA的提升。我们还在Waymo开放数据集上进行了实验，并相对于基线取得了显著的性能改进。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日