Reconstructing 3D objects from a single image guided by pretrained diffusion models has demonstrated promising outcomes. However, due to utilizing the case-agnostic rigid strategy, their generalization ability to arbitrary cases and the 3D consistency of reconstruction are still poor. In this work, we propose Consistent123, a case-aware two-stage method for highly consistent 3D asset reconstruction from one image with both 2D and 3D diffusion priors. In the first stage, Consistent123 utilizes only 3D structural priors for sufficient geometry exploitation, with a CLIP-based case-aware adaptive detection mechanism embedded within this process. In the second stage, 2D texture priors are introduced and progressively take on a dominant guiding role, delicately sculpting the details of the 3D model. Consistent123 aligns more closely with the evolving trends in guidance requirements, adaptively providing adequate 3D geometric initialization and suitable 2D texture refinement for different objects. Consistent123 can obtain highly 3D-consistent reconstruction and exhibits strong generalization ability across various objects. Qualitative and quantitative experiments show that our method significantly outperforms state-of-the-art image-to-3D methods. See https://Consistent123.github.io for a more comprehensive exploration of our generated 3D assets.
翻译:在预训练扩散模型引导下从单张图像重建三维物体已展现出令人鼓舞的成果。然而,由于采用与案例无关的刚性策略,其对任意案例的泛化能力及重建的三维一致性仍显不足。本文提出Consistent123——一种案例感知的两阶段方法,通过结合二维与三维扩散先验,从单张图像实现高一致性三维资产重建。第一阶段中,Consistent123仅利用三维结构先验进行充分的几何探索,并内嵌基于CLIP的案例感知自适应检测机制。第二阶段引入二维纹理先验,使其逐步主导引导过程,精细雕琢三维模型的细节。Consistent123更贴合引导需求的演进趋势,能针对不同物体自适应提供充足的三维几何初始化与恰当的二维纹理细化处理。该方法可获得高度三维一致的重建结果,并对各类物体展现出强泛化能力。定性与定量实验表明,本方法显著优于当前最先进的图像到三维重建方法。详见https://Consistent123.github.io以全面探索生成的三维资产。