Reenacting facial images is an important task that can find numerous applications. We proposed IFaceUV, a fully differentiable pipeline that properly combines 2D and 3D information to conduct the facial reenactment task. The three-dimensional morphable face models (3DMMs) and corresponding UV maps are utilized to intuitively control facial motions and textures, respectively. Two-dimensional techniques based on 2D image warping is further required to compensate for missing components of the 3DMMs such as backgrounds, ear, hair and etc. In our pipeline, we first extract 3DMM parameters and corresponding UV maps from source and target images. Then, initial UV maps are refined by the UV map refinement network and it is rendered to the image with the motion manipulated 3DMM parameters. In parallel, we warp the source image according to the 2D flow field obtained from the 2D warping network. Rendered and warped images are combined in the final editing network to generate the final reenactment image. Additionally, we tested our model for the audio-driven facial reenactment task. Extensive qualitative and quantitative experiments illustrate the remarkable performance of our method compared to other state-of-the-art methods.
翻译:面部图像重演是一项重要任务,可广泛应用于众多场景。我们提出了IFaceUV,一种全微分流水线,通过恰当融合二维与三维信息来完成面部重演任务。该方法利用三维可变形人脸模型(3DMM)及对应UV图,分别直观控制面部运动与纹理。同时需要基于二维图像扭曲的二维技术,以补偿三维可变形人脸模型中缺失的组件(如背景、耳朵、头发等)。在我们的流水线中,首先从源图像与目标图像中提取三维可变形人脸模型参数及对应UV图;随后通过UV图细化网络对初始UV图进行优化,并利用运动操控后的三维可变形人脸模型参数将其渲染为图像。与此同时,根据二维扭曲网络获得的光流场对源图像进行扭曲处理。最终编辑网络将渲染图像与扭曲图像进行融合,生成最终的重演图像。此外,我们还在音频驱动面部重演任务中测试了模型性能。大量定性与定量实验表明,与现有最优方法相比,本方法展现出卓越性能。