HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces

from arxiv, Accepted for publication in ICCV 2023. Project page: https://stelabou.github.io/hyperreenact.github.io/ Code: https://github.com/StelaBou/HyperReenact

In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes. We make the code and the pretrained models publicly available at: https://github.com/StelaBou/HyperReenact .

翻译：本文提出名为HyperReenact的神经面部重演方法，旨在通过目标面部姿态驱动生成具有源身份特征的真实感说话人头部图像。现有最先进的面部重演方法通过训练可控生成模型学习合成逼真面部图像，但生成的重演面部在极端头部姿态变化等挑战性条件下易出现明显视觉伪影，或需要昂贵的少样本微调以更好保留源身份特征。我们通过利用预训练StyleGAN2生成器的照片级生成能力与解耦特性来解决上述局限：首先将真实图像反演至其潜空间，随后使用超网络执行：（i）源身份特征的细化与（ii）面部姿态重定向，从而消除对外部编辑方法（易产生伪影）的依赖。本方法仅在单次设定下运行（即使用单一源帧），支持跨主体重演且无需任何主体特定微调。我们在VoxCeleb1和VoxCeleb2标准基准上，通过定量与定性比较验证了本方法相较于多项现有技术的优越性——在生成无伪影图像方面表现卓越，即使在极端头部姿态变化下仍具显著鲁棒性。相关代码与预训练模型已公开于：https://github.com/StelaBou/HyperReenact 。