Pre-training Point Cloud Compact Model with Partial-aware Reconstruction

The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, two drawbacks hinder their practical application. Firstly, the positional embedding of masked patches in the decoder results in the leakage of their central coordinates, leading to limited 3D representations. Secondly, the excessive model size of existing MPM methods results in higher demands for devices. To address these, we propose to pre-train Point cloud Compact Model with Partial-aware \textbf{R}econstruction, named Point-CPR. Specifically, in the decoder, we couple the vanilla masked tokens with their positional embeddings as randomly masked queries and introduce a partial-aware prediction module before each decoder layer to predict them from the unmasked partial. It prevents the decoder from creating a shortcut between the central coordinates of masked patches and their reconstructed coordinates, enhancing the robustness of models. We also devise a compact encoder composed of local aggregation and MLPs, reducing the parameters and computational requirements compared to existing Transformer-based encoders. Extensive experiments demonstrate that our model exhibits strong performance across various tasks, especially surpassing the leading MPM-based model PointGPT-B with only 2% of its parameters.

翻译：基于掩码点建模（MPM）的预训练点云模型已在多种任务中展现出显著改进。然而，其实际应用仍受两个缺陷制约。首先，解码器中掩码块的位置嵌入会导致其中心坐标信息泄露，从而限制了三维表征能力。其次，现有MPM方法的模型规模过大，对计算设备提出了更高要求。为解决这些问题，我们提出了一种基于局部感知重建的点云紧凑模型预训练方法，命名为Point-CPR。具体而言，在解码器中，我们将原始掩码标记与其位置嵌入耦合为随机掩码查询，并在每个解码层前引入局部感知预测模块，以从未掩码部分预测这些查询。该方法防止解码器在掩码块中心坐标与其重建坐标之间建立捷径，从而增强了模型的鲁棒性。我们还设计了一个由局部聚合层和多层感知机构成的紧凑编码器，与现有基于Transformer的编码器相比，显著减少了参数量和计算需求。大量实验表明，我们的模型在多种任务中均表现出强大性能，尤其在使用仅2%参数量的情况下，超越了领先的基于MPM的模型PointGPT-B。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日