Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

Multimodal-driven talking face generation refers to animating a portrait with the given pose, expression, and gaze transferred from the driving image and video, or estimated from the text and audio. However, existing methods ignore the potential of text modal, and their generators mainly follow the source-oriented feature rearrange paradigm coupled with unstable GAN frameworks. In this work, we first represent the emotion in the text prompt, which could inherit rich semantics from the CLIP, allowing flexible and generalized emotion control. We further reorganize these tasks as the target-oriented texture transfer and adopt the Diffusion Models. More specifically, given a textured face as the source and the rendered face projected from the desired 3DMM coefficients as the target, our proposed Texture-Geometry-aware Diffusion Model decomposes the complex transfer problem into multi-conditional denoising process, where a Texture Attention-based module accurately models the correspondences between appearance and geometry cues contained in source and target conditions, and incorporate extra implicit information for high-fidelity talking face generation. Additionally, TGDM can be gracefully tailored for face swapping. We derive a novel paradigm free of unstable seesaw-style optimization, resulting in simple, stable, and effective training and inference schemes. Extensive experiments demonstrate the superiority of our method.

翻译：多模态驱动的说话人脸生成是指利用驱动图像或视频中传递的姿态、表情和视线，或通过文本和音频估计出的这些信息，使得肖像动画化。然而，现有方法忽略了文本模态的潜力，其生成器主要遵循基于源导向的特征重排范式，并搭配不稳定的GAN框架。在本工作中，我们首次在文本提示中表示情感，这能够继承CLIP中的丰富语义，从而实现灵活且通用的情感控制。我们进一步将这些任务重新组织为目标导向的纹理迁移，并采用扩散模型。具体来说，给定一个带纹理的人脸作为源，以及根据所需3DMM系数投影得到的渲染人脸作为目标，我们提出的纹理几何感知扩散模型将复杂的迁移问题分解为多条件去噪过程。其中，基于纹理注意力模块精确建模源与目标条件中所含外观与几何线索之间的对应关系，并融入额外的隐式信息，以生成高保真的说话人脸。此外，TGDM还可优雅地适配于人脸换脸任务。我们推导出一种无需不稳定的跷跷板式优化的新范式，从而实现了简单、稳定且有效的训练与推理方案。大量实验证明了我们方法的优越性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/