DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.
翻译:DreamFusion 近期展示了利用预训练的文本到图像扩散模型优化神经辐射场(NeRF)的能力,取得了显著的文本到三维合成结果。然而,该方法存在两个固有局限:(a)NeRF 优化速度极慢,以及(b)对 NeRF 采用低分辨率图像空间监督,导致生成的三维模型质量低且处理时间长。本文通过采用两阶段优化框架解决这些局限。首先,我们利用低分辨率扩散先验获得粗模型,并借助稀疏三维哈希网格结构加速优化。以该粗表示为初始化,我们进一步通过高效的微分渲染器与高分辨率潜在扩散模型交互,优化带有纹理的三维网格模型。我们提出的方法名为 Magic3D,可在40分钟内生成高质量的三维网格模型,速度比 DreamFusion(据报道平均耗时1.5小时)快2倍,同时实现更高分辨率。用户研究表明,61.7%的评估者更青睐我们的方法而非 DreamFusion。结合图像条件生成能力,我们为用户提供了控制三维合成的新方式,为各类创意应用开辟了新途径。