The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, these models heavily rely on the Transformer, leading to quadratic complexity and limited decoder, hindering their practice application. To address this limitation, we first conduct a comprehensive analysis of existing Transformer-based MPM, emphasizing the idea that redundancy reduction is crucial for point cloud analysis. To this end, we propose a Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder. Our encoder replaces self-attention with our local aggregation layers to achieve an elegant balance between performance and efficiency. Considering the varying information density between masked and unmasked patches in the decoder inputs of MPM, we introduce a locally constrained Mamba-based decoder. This decoder ensures linear complexity while maximizing the perception of point cloud geometry information from unmasked patches with higher information density. Extensive experimental results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency, especially our LCM-based Point-MAE model, compared to the Transformer-based model, achieved an improvement of 2.24%, 0.87%, and 0.94% in performance on the three variants of ScanObjectNN while reducing parameters by 88% and computation by 73%.
翻译:基于掩码点建模(MPM)的预训练点云模型已在多种任务中展现出显著性能提升。然而,这些模型严重依赖Transformer架构,导致二次复杂度及解码器能力受限,阻碍了其实际应用。为突破此限制,我们首先对现有基于Transformer的MPM方法进行全面分析,强调冗余度降低对点云分析至关重要。为此,我们提出一种局部约束紧凑点云模型(LCM),该模型由局部约束紧凑编码器与基于Mamba的局部约束解码器构成。我们的编码器通过局部聚合层替代自注意力机制,在性能与效率间实现精妙平衡。针对MPM解码器输入中掩码与非掩码区块的信息密度差异,我们引入基于Mamba的局部约束解码器。该解码器在确保线性复杂度的同时,能最大化感知来自信息密度更高的非掩码区块的点云几何信息。大量实验结果表明,我们的紧凑模型在性能与效率上均显著超越现有基于Transformer的模型,特别是我们基于LCM的Point-MAE模型:相较于基于Transformer的模型,在ScanObjectNN的三个变体上性能分别提升2.24%、0.87%和0.94%,同时参数量减少88%,计算量降低73%。