3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals. The success of 3D-aware GAN provides expressive 3D models learned from 2D single-view images only, encouraging researchers to discover semantic editing directions in its latent space. However, previous methods face challenges in balancing quality, efficiency, and generalization. To solve the problem, we explore the possibility of introducing the strength of diffusion model into 3D-aware GANs. In this paper, we present Face Clan, a fast and text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions. To achieve disentangled editing, we propose to diffuse on the latent space under a pair of opposite prompts to estimate the mask indicating the region of interest on latent codes. Based on the mask, we then apply denoising to the masked latent codes to reveal the editing direction. Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description. Experiments demonstrate the effectiveness and generalization of our Face Clan for various pre-trained GANs. It offers an intuitive and wide application for text-guided face editing that contributes to the landscape of multimedia content creation.
翻译:三维人脸编辑是多媒体领域的一项重要任务,旨在通过多种控制信号对三维人脸模型进行操作。三维感知生成对抗网络的成功提供了仅从二维单视图图像学习得到的表达能力强的三维模型,这促使研究者在其潜在空间中探索语义编辑方向。然而,先前的方法在平衡质量、效率和泛化性方面面临挑战。为解决此问题,我们探索将扩散模型的能力引入三维感知生成对抗网络的可能性。本文提出Face Clan,一种基于任意属性描述的快速且文本通用的三维人脸生成与操作方法。为实现解耦编辑,我们提出在潜在空间上基于一对相反提示进行扩散,以估计指示潜在码上感兴趣区域的掩码。基于该掩码,我们对掩码后的潜在码进行去噪以揭示编辑方向。我们的方法提供了一种精确可控的操作手段,允许用户通过文本描述直观地定制感兴趣区域。实验证明了我们的Face Clan方法在各种预训练生成对抗网络中的有效性和泛化性。它为文本引导的人脸编辑提供了直观且广泛的应用,为多媒体内容创作的格局做出了贡献。