Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality dense point clouds with 20K+ points with fine-grained details. To enforce 3D consistency in multi-view depth, we introduce an epipolar line segment attention that conditions the denoising step for a view on its neighboring views. Additionally, a depth fusion module is incorporated into diffusion steps to further ensure the alignment of depth maps. When augmented with surface reconstruction, MVDD can also produce high-quality 3D meshes. Furthermore, MVDD stands out in other tasks such as depth completion, and can serve as a 3D prior, significantly boosting many downstream tasks, such as GAN inversion. State-of-the-art results from extensive experiments demonstrate MVDD's excellent ability in 3D shape generation, depth completion, and its potential as a 3D prior for downstream tasks.
翻译:去噪扩散模型在二维图像生成领域已展现出卓越成果,但在三维形状生成中复制其成功仍具挑战。本文提出利用多视图深度表示——将复杂三维形状编码为易于去噪的二维数据格式——并搭配扩散模型MVDD,能够生成包含2万以上点的高质量稠密点云,细节丰富。为增强多视图深度间的三维一致性,我们引入极线线段注意力机制,使视图的去噪步骤受其相邻视图约束。此外,在扩散步骤中集成深度融合模块,进一步确保深度图的对齐。结合表面重建后,MVDD还能生成高质量三维网格。在深度补全等任务中,MVDD同样表现突出,并可作为三维先验显著提升GAN反演等下游任务性能。大量实验的最优结果证明了MVDD在三维形状生成、深度补全方面的卓越能力,以及作为三维先验用于下游任务的潜力。