Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appearance synthesis. Current solutions handle the two sub-problems within a single generative model, resulting in a challenging trade-off between lip-sync quality and visual details preservation. Instead, we propose to disentangle the motion and appearance, and then generate them one by one with a speech-to-motion diffusion model and a motion-conditioned appearance generation model. However, there still remain challenges in each stage, such as motion-aware identity preservation in (1) and visual details preservation in (2). Therefore, to preserve personal identity, we adopt landmarks to represent the motion, and further employ a landmark-based identity loss. To capture motion-agnostic visual details, we use separate encoders to encode the lip, non-lip appearance and motion, and then integrate them with a learned fusion module. We train MyTalk on a large-scale and diverse dataset. Experiments show that our method generalizes well to the unknown, even out-of-domain person, in terms of both lip sync and visual detail preservation. We encourage the readers to watch the videos on our project page (https://Ingrid789.github.io/MyTalk/).

翻译：本文旨在根据给定语音编辑说话视频中的唇部运动，同时保持人物身份与视觉细节。该任务可分解为两个子问题：(1) 语音驱动的唇部运动生成；(2) 视觉外观合成。现有解决方案通常在单一生成模型中处理这两个子问题，导致唇形同步质量与视觉细节保留之间难以权衡。为此，我们提出将运动与外观解耦，分别通过语音到运动的扩散模型和运动条件外观生成模型依次生成。然而，每个阶段仍存在挑战，例如阶段(1)中需保持运动感知的身份特征，阶段(2)中需保留视觉细节。为保持人物身份，我们采用面部关键点表示运动，并进一步设计基于关键点的身份损失函数。为捕捉与运动无关的视觉细节，我们使用独立编码器分别编码唇部区域、非唇部外观及运动信息，并通过学习的融合模块进行整合。我们在大规模多样化数据集上训练MyTalk模型。实验表明，本方法对未知（甚至域外）人物在唇形同步与视觉细节保留方面均表现出良好的泛化能力。建议读者访问项目页面（https://Ingrid789.github.io/MyTalk/）观看演示视频。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日