We introduce a pioneering autoregressive generative model for 3D point cloud generation. Inspired by visual autoregressive modeling (VAR), we conceptualize point cloud generation as an autoregressive up-sampling process. This leads to our novel model, PointARU, which progressively refines 3D point clouds from coarse to fine scales. PointARU follows a two-stage training paradigm: first, it learns multi-scale discrete representations of point clouds, and then it trains an autoregressive transformer for next-scale prediction. To address the inherent unordered and irregular structure of point clouds, we incorporate specialized point-based up-sampling network modules in both stages and integrate 3D absolute positional encoding based on the decoded point cloud at each scale during the second stage. Our model surpasses state-of-the-art (SoTA) diffusion-based approaches in both generation quality and parameter efficiency across diverse experimental settings, marking a new milestone for autoregressive methods in 3D point cloud generation. Furthermore, PointARU demonstrates exceptional performance in completing partial 3D shapes and up-sampling sparse point clouds, outperforming existing generative models in these tasks.
翻译:本文提出了一种开创性的自回归生成模型,用于三维点云生成。受视觉自回归建模(VAR)的启发,我们将点云生成概念化为一个自回归上采样过程。这催生了我们的新颖模型——PointARU,该模型能够从粗糙到精细逐步细化三维点云。PointARU采用两阶段训练范式:首先,学习点云的多尺度离散表示;然后,训练一个自回归Transformer进行下一尺度预测。为应对点云固有的无序和非规则结构,我们在两个阶段均引入了专门的基于点的上采样网络模块,并在第二阶段根据每个尺度解码后的点云集成了三维绝对位置编码。在多种实验设置下,我们的模型在生成质量和参数效率方面均超越了最先进的基于扩散的方法,标志着自回归方法在三维点云生成领域达到了新的里程碑。此外,PointARU在补全不完整三维形状和上采样稀疏点云方面展现出卓越性能,在这些任务上超越了现有的生成模型。