While current face animation methods can manipulate expressions individually, they suffer from several limitations. The expressions manipulated by some motion-based facial reenactment models are crude. Other ideas modeled with facial action units cannot generalize to arbitrary expressions not covered by annotations. In this paper, we introduce a novel Geometry-aware Facial Expression Translation (GaFET) framework, which is based on parametric 3D facial representations and can stably decoupled expression. Among them, a Multi-level Feature Aligned Transformer is proposed to complement non-geometric facial detail features while addressing the alignment challenge of spatial features. Further, we design a De-expression model based on StyleGAN, in order to reduce the learning difficulty of GaFET in unpaired "in-the-wild" images. Extensive qualitative and quantitative experiments demonstrate that we achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures. Besides, videos or annotated training data are omitted, making our method easier to use and generalize.
翻译:当前的面部动画方法虽能单独操控表情,但仍存在若干局限。基于运动的面部重演模型所操控的表情较为粗糙,而以面部动作单元建模的其他方法则无法泛化至标注未覆盖的任意表情。本文提出一种新颖的几何感知面部表情翻译框架(GaFET),该框架基于参数化三维面部表示,能够稳定解耦表情。其中,我们提出了一种多级特征对齐Transformer,用于在解决空间特征对齐挑战的同时,补充非几何面部细节特征。此外,我们设计了基于StyleGAN的去表情模型,以降低GaFET在非配对"野外"图像中的学习难度。大量定性与定量实验表明,与现有最优方法相比,我们实现了更高质量、更准确的面部表情迁移结果,并展示了其在多种姿态与复杂纹理下的适用性。此外,我们的方法无需视频或标注训练数据,更易使用与泛化。