Recent diffusion-based Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations. However, these methods usually struggle to produce high-fidelity 3D models, frequently yielding excessively blurred textures. We attribute this issue to the insufficient consideration of cross-view consistency during the diffusion process, resulting in significant disparities between different views and ultimately leading to blurred 3D representations. In this paper, we address this issue by comprehensively exploiting multi-view priors in both the conditioning and diffusion procedures to produce consistent, detail-rich portraits. From the conditioning standpoint, we propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits. From the diffusion perspective, considering the significant impact of the diffusion noise distribution on detailed texture generation, we propose a Multi-View Noise Resamplig Strategy integrated within the optimization process leveraging cross-view priors to enhance representation consistency. Extensive experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image. The project page is at \url{https://haoran-wei.github.io/Portrait-Diffusion}.
翻译:基于扩散模型的单图像三维人像生成方法通常利用二维扩散模型提供多视角知识,并将其蒸馏至三维表示中。然而,这些方法往往难以生成高保真的三维模型,常产生过度模糊的纹理。我们将此问题归因于扩散过程中对跨视角一致性的考量不足,导致不同视角间存在显著差异,最终产生模糊的三维表示。本文通过全面利用条件输入与扩散过程中的多视角先验,以生成一致且细节丰富的人像来解决此问题。在条件输入方面,我们提出一种混合先验扩散模型,该模型显式与隐式地结合多视角先验作为条件,以增强生成的多视角人像的状态一致性。从扩散过程的角度,考虑到扩散噪声分布对细节纹理生成的显著影响,我们在优化过程中提出一种集成跨视角先验的多视角噪声重采样策略,以增强表示一致性。大量实验表明,我们的方法能够从单张图像生成具有精确几何与丰富细节的三维人像。项目页面位于 \url{https://haoran-wei.github.io/Portrait-Diffusion}。