We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.
翻译:我们提出Magic123,一种基于粗到细的两阶段方法,通过同时利用2D和3D先验,从野外单张无姿态图像生成高质量、带纹理的3D网格。第一阶段,我们优化神经辐射场以生成粗糙几何结构;第二阶段,采用内存高效的可微网格表示,生成高分辨率网格并附有视觉吸引人的纹理。在两个阶段中,3D内容均通过参考视角监督以及由2D和3D扩散先验联合引导的新视角进行学习。我们引入2D与3D先验之间的单一权衡参数,以控制生成几何结构的探索(更具想象力)与利用(更精确)程度。此外,我们采用文本反转和单目深度正则化,分别促进跨视角的一致性外观并防止退化解。通过合成基准和多样化真实世界图像的广泛实验验证,Magic123相较于先前图像到3D技术展现出显著改进。我们的代码、模型及生成的3D资产已开源至https://github.com/guochengqian/Magic123。