Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator

Multimodal-driven talking face generation refers to animating a portrait with the given pose, expression, and gaze transferred from the driving image and video, or estimated from the text and audio. However, existing methods ignore the potential of text modal, and their generators mainly follow the source-oriented feature rearrange paradigm coupled with unstable GAN frameworks. In this work, we first represent the emotion in the text prompt, which could inherit rich semantics from the CLIP, allowing flexible and generalized emotion control. We further reorganize these tasks as the target-oriented texture transfer and adopt the Diffusion Models. More specifically, given a textured face as the source and the rendered face projected from the desired 3DMM coefficients as the target, our proposed Texture-Geometry-aware Diffusion Model decomposes the complex transfer problem into multi-conditional denoising process, where a Texture Attention-based module accurately models the correspondences between appearance and geometry cues contained in source and target conditions, and incorporate extra implicit information for high-fidelity talking face generation. Additionally, TGDM can be gracefully tailored for face swapping. We derive a novel paradigm free of unstable seesaw-style optimization, resulting in simple, stable, and effective training and inference schemes. Extensive experiments demonstrate the superiority of our method.

翻译：多模态驱动说话人脸生成是指根据驱动图像或视频中传递的姿态、表情和视线，或从文本和音频中估计出的信息，对肖像进行动画生成。然而，现有方法忽略了文本模态的潜力，其生成器主要遵循源导向特征重排范式，并耦合了不稳定的GAN框架。在本工作中，我们首先在文本提示中表示情感，该提示可从CLIP中继承丰富的语义，从而实现灵活且泛化的情感控制。我们进一步将这些任务重新组织为目标导向的纹理迁移，并采用扩散模型。具体而言，给定带纹理的人脸作为源，以及从期望的3DMM系数投影生成的渲染人脸作为目标，我们提出的纹理-几何感知扩散模型将复杂的迁移问题分解为多条件去噪过程，其中基于纹理注意力的模块精确建模源和目标条件中外观与几何线索之间的对应关系，并融入额外的隐式信息，以实现高保真说话人脸生成。此外，TGDM可优雅地适配于人脸交换任务。我们推导出一种摆脱了不稳定跷跷板式优化的新颖范式，从而实现了简单、稳定且高效的训练与推理方案。大量实验证明了我们方法的优越性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/