The creation of 3D human face avatars from a single unconstrained image is a fundamental task that underlies numerous real-world vision and graphics applications. Despite the significant progress made in generative models, existing methods are either less suited in design for human faces or fail to generalise from the restrictive training domain to unconstrained facial images. To address these limitations, we propose a novel model, Gen3D-Face, which generates 3D human faces with unconstrained single image input within a multi-view consistent diffusion framework. Given a specific input image, our model first produces multi-view images, followed by neural surface construction. To incorporate face geometry information while preserving generalisation to in-the-wild inputs, we estimate a subject-specific mesh directly from the input image, enabling training and evaluation without ground-truth 3D supervision. Importantly, we introduce a multi-view joint generation scheme to enhance the appearance consistency among different views. To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images for generic human subject across domains. Extensive experiments demonstrate the efficacy and superiority of our method over previous alternatives for out-of-domain single image 3D face generation and the top ranking competition for the in-domain setting.
翻译:从单张无约束图像创建三维人脸化身是支撑众多现实世界视觉与图形应用的基础任务。尽管生成模型已取得显著进展,但现有方法要么在设计上不适用于人脸,要么无法从受限的训练域泛化至无约束的人脸图像。为克服这些局限,我们提出一种新颖模型Gen3D-Face,该模型在多视角一致扩散框架内,通过无约束的单张图像输入生成三维人脸。给定特定输入图像,我们的模型首先生成多视角图像,随后进行神经表面重建。为融入人脸几何信息同时保持对真实场景输入的泛化能力,我们直接从输入图像估计主体特定网格,从而无需真实三维监督即可进行训练与评估。重要的是,我们引入了多视角联合生成方案以增强不同视角间的外观一致性。据我们所知,这是首个面向跨领域通用人类主体、从单张图像创建逼真三维人脸化身的尝试与基准测试。大量实验证明,在跨域单图像三维人脸生成任务中,我们的方法相较于现有方案具有显著效能与优越性,并在域内设定中取得顶级排名竞争力。