Recent advancements in sequence modeling have led to the development of the Mamba architecture, noted for its selective state space approach, offering a promising avenue for efficient long sequence handling. However, its application in 3D shape generation, particularly at high resolutions, remains underexplored. Traditional diffusion transformers (DiT) with self-attention mechanisms, despite their potential, face scalability challenges due to the cubic complexity of attention operations as input length increases. This complexity becomes a significant hurdle when dealing with high-resolution voxel sizes. To address this challenge, we introduce a novel diffusion architecture tailored for 3D point clouds generation-Diffusion Mamba (DiM-3D). This architecture forgoes traditional attention mechanisms, instead utilizing the inherent efficiency of the Mamba architecture to maintain linear complexity with respect to sequence length. DiM-3D is characterized by fast inference times and substantially lower computational demands, quantified in reduced Gflops, thereby addressing the key scalability issues of prior models. Our empirical results on the ShapeNet benchmark demonstrate that DiM-3D achieves state-of-the-art performance in generating high-fidelity and diverse 3D shapes. Additionally, DiM-3D shows superior capabilities in tasks like 3D point cloud completion. This not only proves the model's scalability but also underscores its efficiency in generating detailed, high-resolution voxels necessary for advanced 3D shape modeling, particularly excelling in environments requiring high-resolution voxel sizes. Through these findings, we illustrate the exceptional scalability and efficiency of the Diffusion Mamba framework in 3D shape generation, setting a new standard for the field and paving the way for future explorations in high-resolution 3D modeling technologies.
翻译:序列建模的最新进展催生了Mamba架构,该架构以其选择性状态空间方法而著称,为高效处理长序列提供了有前景的途径。然而,其在三维形状生成,尤其是高分辨率下的应用,仍未得到充分探索。传统的扩散Transformer(DiT)虽然具有潜力,但由于自注意力机制在输入长度增加时具有立方复杂度,面临可扩展性挑战。在处理高分辨率体素时,这种复杂性成为一个重大障碍。为应对这一挑战,我们提出了一种专为三维点云生成设计的新型扩散架构——扩散Mamba(DiM-3D)。该架构摒弃了传统的注意力机制,转而利用Mamba架构固有的效率来保持相对于序列长度的线性复杂度。DiM-3D的特点是推理速度快,计算需求显著降低(以更低的Gflops量化),从而解决了先前模型的关键可扩展性问题。我们在ShapeNet基准测试上的实证结果表明,DiM-3D在生成高保真度和多样化的三维形状方面实现了最先进的性能。此外,DiM-3D在诸如三维点云补全等任务中展现出卓越的能力。这不仅证明了模型的可扩展性,也突显了其在生成高级三维形状建模所需的精细、高分辨率体素方面的效率,尤其在高分辨率体素需求的环境中表现出色。通过这些发现,我们阐明了扩散Mamba框架在三维形状生成中卓越的可扩展性和效率,为该领域设立了新标准,并为未来高分辨率三维建模技术的探索铺平了道路。