In recent years, the role of image generative models in facial reenactment has been steadily increasing. Such models are usually subject-agnostic and trained on domain-wide datasets. The appearance of the reenacted individual is learned from a single image, and hence, the entire breadth of the individual's appearance is not entirely captured, leading these methods to resort to unfaithful hallucination. Thanks to recent advancements, it is now possible to train a personalized generative model tailored specifically to a given individual. In this paper, we propose a novel method for facial reenactment using a personalized generator. We train the generator using frames from a short, yet varied, self-scan video captured using a simple commodity camera. Images synthesized by the personalized generator are guaranteed to preserve identity. The premise of our work is that the task of reenactment is thus reduced to accurately mimicking head poses and expressions. To this end, we locate the desired frames in the latent space of the personalized generator using carefully designed latent optimization. Through extensive evaluation, we demonstrate state-of-the-art performance for facial reenactment. Furthermore, we show that since our reenactment takes place in a semantic latent space, it can be semantically edited and stylized in post-processing.
翻译:近年来,图像生成模型在面部表情重演中的作用稳步提升。此类模型通常与对象无关,并在领域级数据集上训练。重演对象的外观信息仅从单张图像中学习,因此无法完整捕捉其外观全貌,导致这些方法不得不依赖不可靠的幻觉生成。得益于最新进展,现在已有可能训练出针对特定个体的个性化生成模型。本文提出了一种利用个性化生成器进行面部表情重演的新方法。我们使用由普通消费相机拍摄的简短但多样化的自拍视频帧来训练该生成器。个性化生成器合成的图像能够确保保留身份特征。我们的核心思想是:重演任务因此简化为精确模仿头部姿态与表情。为此,我们通过精心设计的潜在空间优化,在个性化生成器的潜在空间中定位目标帧。通过大量评估,我们展示了面部表情重演领域最先进的性能。此外,由于我们的重演发生在语义潜在空间中,其成品可在后期处理中进行语义编辑和风格化转换。