In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from long per-case optimization time with inconsistent issues. Recent works address the problem and generate better 3D results either by finetuning a multi-view diffusion model or training a fast feed-forward model. However, they still lack intricate textures and complex geometries due to inconsistency and limited generated resolution. To simultaneously achieve high fidelity, consistency, and efficiency in single image-to-3D, we propose a novel framework Unique3D that includes a multi-view diffusion model with a corresponding normal diffusion model to generate multi-view images with their normal maps, a multi-level upscale process to progressively improve the resolution of generated orthographic multi-views, as well as an instant and consistent mesh reconstruction algorithm called ISOMER, which fully integrates the color and geometric priors into mesh results. Extensive experiments demonstrate that our Unique3D significantly outperforms other image-to-3D baselines in terms of geometric and textural details.
翻译:本文提出Unique3D,一种新颖的图像到三维框架,能够从单视角图像高效生成高质量三维网格,具备领先的生成保真度和强大的泛化能力。基于分数蒸馏采样(SDS)的先前方法可通过从大规模二维扩散模型中蒸馏三维知识来产生多样化的三维结果,但它们通常存在单例优化时间长且结果不一致的问题。近期研究通过微调多视角扩散模型或训练快速前馈模型来解决该问题并生成更好的三维结果。然而,由于不一致性和生成分辨率有限,它们仍缺乏精细的纹理和复杂的几何结构。为了在单图像到三维任务中同时实现高保真度、一致性和高效率,我们提出新型框架Unique3D,其包含:一个多视角扩散模型及其对应的法向扩散模型,用于生成带法向图的多视角图像;一种多级上采样流程,逐步提升生成的正交多视角图像分辨率;以及名为ISOMER的即时一致网格重建算法,该算法将颜色与几何先验完整融合至网格结果中。大量实验表明,我们的Unique3D在几何与纹理细节方面显著优于其他图像到三维基线方法。