Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks. Inspired by the advancements of the GPT, we present PointGPT, a novel approach that extends the concept of GPT to point clouds, addressing the challenges associated with disorder properties, low information density, and task gaps. Specifically, a point cloud auto-regressive generation task is proposed to pre-train transformer models. Our method partitions the input point cloud into multiple point patches and arranges them in an ordered sequence based on their spatial proximity. Then, an extractor-generator based transformer decoder, with a dual masking strategy, learns latent representations conditioned on the preceding point patches, aiming to predict the next one in an auto-regressive manner. Our scalable approach allows for learning high-capacity models that generalize well, achieving state-of-the-art performance on various downstream tasks. In particular, our approach achieves classification accuracies of 94.9% on the ModelNet40 dataset and 93.4% on the ScanObjectNN dataset, outperforming all other transformer models. Furthermore, our method also attains new state-of-the-art accuracies on all four few-shot learning benchmarks.
翻译:大语言模型基于生成式预训练变换器在下游任务中展现出卓越性能。受GPT技术进展启发,我们提出PointGPT——一种将GPT框架扩展至点云数据的新方法,旨在解决点云的无序特性、低信息密度及任务鸿沟等挑战。具体而言,我们设计了一种点云自回归生成任务用于预训练变换器模型。该方法将输入点云划分为多个点块,并根据其空间邻近性按序排列。随后,采用基于生成器-提取器双掩码机制的变换器解码器,学习基于前序点块的潜在表征,以自回归方式预测后续点块。该可扩展方法能够训练高容量模型并展现良好泛化能力,在多项下游任务中取得最先进性能。特别是在ModelNet40数据集上达到94.9%的分类准确率,在ScanObjectNN数据集上达到93.4%的分类准确率,超越所有其他变换器模型。此外,本方法还在全部四个小样本学习基准测试中刷新了准确率记录。