The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, two drawbacks hinder their practical application. Firstly, the positional embedding of masked patches in the decoder results in the leakage of their central coordinates, leading to limited 3D representations. Secondly, the excessive model size of existing MPM methods results in higher demands for devices. To address these, we propose to pre-train Point cloud Compact Model with Partial-aware \textbf{R}econstruction, named Point-CPR. Specifically, in the decoder, we couple the vanilla masked tokens with their positional embeddings as randomly masked queries and introduce a partial-aware prediction module before each decoder layer to predict them from the unmasked partial. It prevents the decoder from creating a shortcut between the central coordinates of masked patches and their reconstructed coordinates, enhancing the robustness of models. We also devise a compact encoder composed of local aggregation and MLPs, reducing the parameters and computational requirements compared to existing Transformer-based encoders. Extensive experiments demonstrate that our model exhibits strong performance across various tasks, especially surpassing the leading MPM-based model PointGPT-B with only 2% of its parameters.
翻译:基于掩码点建模(MPM)的预训练点云模型已在多种任务中展现出显著改进。然而,其实际应用仍受两个缺陷制约。首先,解码器中掩码块的位置嵌入会导致其中心坐标信息泄露,从而限制了三维表征能力。其次,现有MPM方法的模型规模过大,对计算设备提出了更高要求。为解决这些问题,我们提出了一种基于局部感知重建的点云紧凑模型预训练方法,命名为Point-CPR。具体而言,在解码器中,我们将原始掩码标记与其位置嵌入耦合为随机掩码查询,并在每个解码层前引入局部感知预测模块,以从未掩码部分预测这些查询。该方法防止解码器在掩码块中心坐标与其重建坐标之间建立捷径,从而增强了模型的鲁棒性。我们还设计了一个由局部聚合层和多层感知机构成的紧凑编码器,与现有基于Transformer的编码器相比,显著减少了参数量和计算需求。大量实验表明,我们的模型在多种任务中均表现出强大性能,尤其在使用仅2%参数量的情况下,超越了领先的基于MPM的模型PointGPT-B。