We propose a method to transfer pose and expression between face images. Given a source and target face portrait, the model produces an output image in which the pose and expression of the source face image are transferred onto the target identity. The architecture consists of two encoders and a mapping network that projects the two inputs into the latent space of StyleGAN2, which finally generates the output. The training is self-supervised from video sequences of many individuals. Manual labeling is not required. Our model enables the synthesis of random identities with controllable pose and expression. Close-to-real-time performance is achieved.
翻译:我们提出了一种在面部图像间迁移姿态与表情的方法。给定源面部肖像与目标面部肖像,该模型生成的输出图像能将源面部图像的姿态与表情迁移至目标身份上。该架构包含两个编码器和一个映射网络,可将两个输入投影至StyleGAN2的潜在空间中,最终由StyleGAN2生成输出图像。训练过程基于多人物视频序列进行自监督学习,无需人工标注。我们的模型能够合成具有可控姿态与表情的随机身份图像,并实现了接近实时的处理性能。