In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.
翻译:本文研究仅从单张图像创建高保真三维内容的问题。该任务本质上具有挑战性:它需要同时估计底层三维几何结构并生成未见纹理。为应对这一挑战,我们利用预训练的二维扩散模型中的先验知识,将其作为三维感知监督用于三维创建。我们的方法Make-It-3D采用两阶段优化流程:第一阶段通过在正视图上约束参考图像、在新视图上利用扩散先验来优化神经辐射场;第二阶段将粗模型转化为带纹理的点云,并在利用参考图像高质量纹理的同时,借助扩散先验进一步提升真实感。大量实验表明,我们的方法在忠实重建和视觉质量方面显著优于现有方法。该方案首次实现了针对通用物体从单张图像进行高质量三维创建,并支持文本到三维生成、纹理编辑等多种应用。