Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, previous methods focus on controllable 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh, built on top of EG3D [1], a recent tri-plane-based generative model. To achieve accurate control over fine-grained 3D face shapes of the synthesized images, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis framework. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.
翻译:借助图像生成模型的最新进展,现有可控人脸图像合成方法能够生成具有不同程度可控性的高保真图像,例如控制生成人脸图像的形状、表情、纹理和姿态。然而,以往方法聚焦于可控二维图像生成模型,其在较大表情和姿态变化下易产生不一致的人脸图像。本文提出了一种基于NeRF的条件性三维人脸合成新框架,通过施加来自三维人脸先验的显式三维条件,实现对生成人脸图像的三维可控性。该框架的核心是条件生成占用场(cGOF++),它能够有效约束生成人脸的形状与给定的三维可变形模型(3DMM)网格一致,并基于近期三平面生成模型EG3D[1]构建。为实现对合成图像细粒度三维人脸形状的精确控制,我们进一步在合成框架中引入了三维地标损失和体积扭曲损失。实验验证了所提方法的有效性,该方法能够生成高保真人脸图像,并在三维可控精度上优于现有基于二维的最先进可控人脸合成方法。