In this study, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion control in existing video-based face generation approaches. Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation, providing a unified framework for modeling face expressions and head pose. This not only enables precise extraction of motion features from driving videos, but also contributes to the faithful preservation of face shape and geometry. Specifically, we enhance the latent diffusion model with rich 3D expression and detailed pose information by incorporating depth maps, normal maps, and rendering maps derived from FLAME sequences. These maps serve as motion guidance and are encoded into the denoising UNet through a specifically designed Geometric Guidance Encoder (GGE). A multi-layer feature fusion module with integrated self-attention mechanisms is used to combine facial appearance and motion latent features within the spatial domain. By utilizing the 3D face parametric model as motion guidance, our method enables parametric alignment of face identity between the reference image and the motion captured from the driving video. Experimental results on benchmark datasets show that our method excels at generating high-quality face animations with precise expression and head pose variation modeling. In addition, it demonstrates strong generalization performance on out-of-domain images. Code is publicly available at https://github.com/weimengting/MagicPortrait.
翻译:在本研究中,我们提出了一种视频人脸重演方法,该方法将三维人脸参数化模型集成到潜在扩散框架中,旨在改进现有基于视频的人脸生成方法中的形状一致性与运动控制。我们的方法采用FLAME(基于关节模型与表情学习的人脸)模型作为三维人脸参数化表示,为建模人脸表情与头部姿态提供了一个统一框架。这不仅能够从驱动视频中精确提取运动特征,还有助于忠实保持人脸形状与几何结构。具体而言,我们通过融入从FLAME序列导出的深度图、法线图与渲染图,利用丰富的三维表情与详细姿态信息增强潜在扩散模型。这些图作为运动引导,并通过专门设计的几何引导编码器(GGE)编码到去噪UNet中。一个集成自注意力机制的多层特征融合模块被用于在空间域内结合人脸外观与运动潜在特征。通过利用三维人脸参数化模型作为运动引导,我们的方法能够实现参考图像与从驱动视频捕获的运动之间的人脸身份参数对齐。在基准数据集上的实验结果表明,我们的方法在生成高质量人脸动画方面表现优异,能够精确建模表情与头部姿态变化。此外,该方法在域外图像上展现出强大的泛化性能。代码公开于 https://github.com/weimengting/MagicPortrait。