We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions. Through the adaptation of the Auto-Regressive model and the utilization of large language models, we have developed a remarkable model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D. Our approach addresses the limitations of existing methods by enhancing the quality and diversity of generated 3D shapes. To tackle the challenges of high-resolution 3D shape generation, our model incorporates tri-plane features as latent representations, effectively reducing computational complexity. Additionally, we introduce a discrete codebook for efficient quantization of these representations. Leveraging the power of transformers, we enable multi-modal conditional generation, facilitating the production of diverse and visually impressive 3D shapes. To train our expansive model, we leverage an ensemble of publicly-available 3D datasets, consisting of a comprehensive collection of approximately 900,000 objects from renowned repositories such as ModelNet40, ShapeNet, Pix3D, 3D-Future, and Objaverse. This diverse dataset empowers our model to learn from a wide range of object variations, bolstering its ability to generate high-quality and diverse 3D shapes. Extensive experimentation demonstrate the remarkable efficacy of our approach in significantly improving the visual quality of generated 3D shapes. By pushing the boundaries of 3D generation, introducing novel methods for latent representation learning, and harnessing the power of transformers for multi-modal conditional generation, our contributions pave the way for substantial advancements in the field. Our work unlocks new possibilities for applications in gaming, virtual reality, product design, and other domains that demand high-quality and diverse 3D objects.
翻译:我们通过在三维形状生成领域实现前所未有的规模化扩展,取得了重大突破。通过改编自回归模型并利用大型语言模型,我们开发了一个拥有惊人36亿可训练参数的卓越模型,将其命名为Argus-3D,使其成为迄今为止最大的三维形状生成模型。我们的方法通过提升生成三维形状的质量和多样性,克服了现有方法的局限性。为了应对高分辨率三维形状生成的挑战,我们的模型采用三平面特征作为潜在表征,有效降低了计算复杂度。此外,我们引入了一个离散码本,用于对这些表征进行高效量化。借助Transformer的强大能力,我们实现了多模态条件生成,从而能够生成多样且视觉上令人印象深刻的三维形状。为训练这一庞大模型,我们整合了多个公开可用的三维数据集,包括来自ModelNet40、ShapeNet、Pix3D、3D-Future和Objaverse等知名资源库的约90万个物体的全面集合。这一多样化数据集使我们的模型能够从广泛的物体变体中学习,增强了其生成高质量和多样化三维形状的能力。大量实验证明,我们的方法在显著提升生成三维形状的视觉质量方面具有卓越效果。通过突破三维生成的界限、引入潜在表征学习的新方法,以及利用Transformer实现多模态条件生成,我们的工作为该领域的发展铺平了道路。这一研究为游戏、虚拟现实、产品设计及其他需要高质量多样化三维物体的领域开启了新的可能性。