In the field of 2D image generation modeling and representation learning, Masked Generative Encoder (MAGE) has demonstrated the synergistic potential between generative modeling and representation learning. Inspired by this, we propose Point-MAGE to extend this concept to point cloud data. Specifically, this framework first utilizes a Vector Quantized Variational Autoencoder (VQVAE) to reconstruct a neural field representation of 3D shapes, thereby learning discrete semantic features of point patches. Subsequently, by combining the masking model with variable masking ratios, we achieve synchronous training for both generation and representation learning. Furthermore, our framework seamlessly integrates with existing point cloud self-supervised learning (SSL) models, thereby enhancing their performance. We extensively evaluate the representation learning and generation capabilities of Point-MAGE. In shape classification tasks, Point-MAGE achieved an accuracy of 94.2% on the ModelNet40 dataset and 92.9% (+1.3%) on the ScanObjectNN dataset. Additionally, it achieved new state-of-the-art performance in few-shot learning and part segmentation tasks. Experimental results also confirmed that Point-MAGE can generate detailed and high-quality 3D shapes in both unconditional and conditional settings.
翻译:在二维图像生成建模与表征学习领域,掩码生成编码器(MAGE)已展现出生成建模与表征学习间的协同潜力。受此启发,我们提出Point-MAGE以将这一概念扩展至点云数据。具体而言,该框架首先利用向量量化变分自编码器(VQVAE)重建三维形状的神经场表征,从而学习点云块片的离散语义特征。随后,通过将掩码模型与可变掩码比例相结合,我们实现了生成与表征学习的同步训练。此外,本框架能够与现有点云自监督学习(SSL)模型无缝集成,从而提升其性能。我们全面评估了Point-MAGE在表征学习与生成方面的能力。在形状分类任务中,Point-MAGE在ModelNet40数据集上达到了94.2%的准确率,在ScanObjectNN数据集上达到92.9%(相对提升+1.3%)。同时,该模型在小样本学习与部件分割任务中取得了新的最优性能。实验结果亦证实,Point-MAGE在无条件与有条件设定下均能生成细节丰富且高质量的三维形状。