Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct generation, ignoring the editing, restricting them from synthesizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts. To enable sequential editing, we introduce a UV domain consistency preservation regularization, preventing unintentional changes to irrelevant facial attributes. Besides, we propose a self-guided consistency weight strategy to improve editing efficacy while preserving consistency. Through comprehensive experiments, we showcase our method's superiority in face synthesis. Project page: https://faceg2e.github.io/.
翻译:文本引导的三维人脸合成通过利用文本到图像(T2I)扩散模型取得了显著成果。然而,现有方法大多仅关注直接生成,忽略了编辑环节,从而限制其通过迭代调整合成定制化三维人脸的能力。本文提出一个从人脸生成到编辑的统一文本引导框架。在生成阶段,我们提出几何-纹理解耦生成方法,以缓解因耦合导致的几何细节损失。此外,解耦使我们能够将生成的几何作为纹理生成的条件,从而获得高度几何-纹理对齐的结果。我们进一步采用微调后的纹理扩散模型,在RGB和YUV空间中提升纹理质量。在编辑阶段,我们首先利用预训练的扩散模型基于文本更新人脸几何或纹理。为实现顺序编辑,我们引入紫外域一致性保持正则化,防止无关面部属性发生意外变化。此外,我们提出自引导一致性权重策略,在提升编辑效果的同时保持一致性。通过全面实验,我们展示了该方法在人脸合成中的优越性。项目页面:https://faceg2e.github.io/。