Representation and generative learning, as reconstruction-based methods, have demonstrated their potential for mutual reinforcement across various domains. In the field of point cloud processing, although existing studies have adopted training strategies from generative models to enhance representational capabilities, these methods are limited by their inability to genuinely generate 3D shapes. To explore the benefits of deeply integrating 3D representation learning and generative learning, we propose an innovative framework called \textit{Point-MGE}. Specifically, this framework first utilizes a vector quantized variational autoencoder to reconstruct a neural field representation of 3D shapes, thereby learning discrete semantic features of point patches. Subsequently, we design a sliding masking ratios to smooth the transition from representation learning to generative learning. Moreover, our method demonstrates strong generalization capability in learning high-capacity models, achieving new state-of-the-art performance across multiple downstream tasks. In shape classification, Point-MGE achieved an accuracy of 94.2% (+1.0%) on the ModelNet40 dataset and 92.9% (+5.5%) on the ScanObjectNN dataset. Experimental results also confirmed that Point-MGE can generate high-quality 3D shapes in both unconditional and conditional settings.
翻译:表征学习与生成学习作为基于重建的方法,已在多个领域展现出相互促进的潜力。在点云处理领域,尽管现有研究采用了生成模型的训练策略来增强表征能力,但这些方法受限于无法真正生成三维形状。为探索三维表征学习与生成学习深度融合的益处,我们提出了一种创新框架 \textit{Point-MGE}。具体而言,该框架首先利用向量量化变分自编码器重建三维形状的神经场表征,从而学习点云块片的离散语义特征。随后,我们设计了滑动掩码比率以平滑地从表征学习过渡到生成学习。此外,我们的方法在学习高容量模型时展现出强大的泛化能力,在多个下游任务中实现了新的最先进性能。在形状分类任务中,Point-MGE 在 ModelNet40 数据集上达到了 94.2%(+1.0%)的准确率,在 ScanObjectNN 数据集上达到了 92.9%(+5.5%)的准确率。实验结果也证实,Point-MGE 在无条件与有条件设置下均能生成高质量的三维形状。