Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.

翻译：在MAE主导的掩码图像建模趋势下，生成式预训练已展现出提升二维视觉基础模型性能的巨大潜力。然而在三维视觉领域，对基于Transformer主干网络的过度依赖以及点云的无序性，限制了生成式预训练的进一步发展。本文提出一种新颖的3D到2D生成式预训练方法，该方法可适配任意点云模型。我们利用交叉注意力机制，通过生成不同指定姿态下的视角图像作为预训练方案。相较于点云形式的预训练，生成视角图像能提供更精确的监督信号，从而帮助三维主干网络更精细地理解点云的几何结构与立体关系。实验结果证明，我们提出的3D到2D生成式预训练方法优于以往的预训练方法。该方法还能有效提升基于架构的方法的性能，在ScanObjectNN分类任务和ShapeNetPart分割任务的微调中达到当前最优水平。代码已开源至https://github.com/wangzy22/TAP。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日