ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image

While recent works have achieved great success on image-to-3D object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framework, ID-Sculpt, to generate high-quality 3D heads while preserving their identities. Our work incorporates the identity information of the portrait image into three parts: 1) geometry initialization, 2) geometry sculpting, and 3) texture generation stages. Given a reference portrait image, we first align the identity features with text features to realize ID-aware guidance enhancement, which contains the control signals representing the face information. We then use the canny map, ID features of the portrait image, and a pre-trained text-to-normal/depth diffusion model to generate ID-aware geometry supervision, and 3D-GAN inversion is employed to generate ID-aware geometry initialization. Furthermore, with the ability to inject identity information into 3D head generation, we use ID-aware guidance to calculate ID-aware Score Distillation (ISD) for geometry sculpting. For texture generation, we adopt the ID Consistent Texture Inpainting and Refinement which progressively expands the view for texture inpainting to obtain an initialization UV texture map. We then use the ID-aware guidance to provide image-level supervision for noisy multi-view images to obtain a refined texture map. Extensive experiments demonstrate that we can generate high-quality 3D heads with accurate geometry and texture from a single in-the-wild portrait image.

翻译：尽管近期研究在图像到三维物体生成方面取得了显著进展，但从单张图像生成高质量、高保真度的三维头部模型仍面临巨大挑战。先前基于文本的三维头部生成方法受限于文本描述能力，而基于图像的方法则难以生成高质量的头部几何结构。为解决这一难题，我们提出了一种新颖的框架ID-Sculpt，能够在保持身份特征的同时生成高质量的三维头部模型。我们的研究将肖像图像的身份信息整合到三个核心环节：1）几何初始化，2）几何雕刻，以及3）纹理生成阶段。给定参考肖像图像，我们首先将身份特征与文本特征对齐，实现包含面部信息控制信号的ID感知引导增强。随后利用边缘检测图、肖像图像的ID特征以及预训练的文本到法线/深度扩散模型，生成ID感知的几何监督信号，并采用3D-GAN反演技术生成ID感知的几何初始化结果。进一步地，通过将身份信息注入三维头部生成过程，我们运用ID感知引导来计算几何雕刻所需的ID感知分数蒸馏（ISD）。在纹理生成阶段，我们采用ID一致性纹理修复与优化方法，通过渐进式视角扩展进行纹理修复，获得初始化的UV纹理贴图。随后利用ID感知引导为含噪多视角图像提供图像级监督，最终得到优化后的纹理贴图。大量实验表明，我们的方法能够从单张野外肖像图像生成具有精确几何结构与纹理的高质量三维头部模型。