The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.
翻译:高质量3D内容创作需求的日益增长推动了从单张图像和/或文本提示自动生成3D物体模型方法的发展。然而,当前最先进的图像转3D方法重建的3D物体与输入图像的对应性较差,且多视角一致性较低。近期最先进的文本转3D方法也存在局限,生成的3D样本多样性低且合成时间长。为解决这些挑战,我们提出DITTO-NeRF,一种从文本提示或单张图像生成高质量3D NeRF模型的新型流水线。我们的DITTO-NeRF包括:利用给定或文本生成的正面2D图像构建有限边界(IB)角度下的高质量部分3D物体,随后通过修补潜扩散模型迭代重建剩余的3D NeRF。我们提出了渐进式3D物体重建方案,涵盖尺度(从低分辨率到高分辨率)、角度(从初始IB角度到外边界(OB)角度)和掩码(从物体到背景边界),使得IB的高质量信息能传播到OB。我们的DITTO-NeRF在保真度和多样性方面定性和定量均优于现有方法,且训练速度远超DreamFusion、NeuralLift-360等图像/文本转3D前沿技术。