DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Diffusion models have shown remarkable success in a variety of downstream generative tasks, yet remain under-explored in the important and challenging expressive talking head generation. In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network is able to consistently synthesize high-quality audio-driven face motions across diverse expressions. To enhance the expressiveness and accuracy of lip motions, we introduce a style-aware lip expert that can guide lip-sync while being mindful of the speaking styles. To eliminate the need for expression reference video or text, an extra diffusion-based style predictor is utilized to predict the target expression directly from the audio. By this means, DreamTalk can harness powerful diffusion models to generate expressive faces effectively and reduce the reliance on expensive style references. Experimental results demonstrate that DreamTalk is capable of generating photo-realistic talking faces with diverse speaking styles and achieving accurate lip motions, surpassing existing state-of-the-art counterparts.

翻译：扩散模型在各种下游生成任务中已展现出显著的成效，但在重要且具有挑战性的富有表现力说话头像生成领域仍鲜有探索。为此，本文提出DreamTalk框架以填补这一空白，该框架通过精心设计释放扩散模型在生成富有表现力说话头像方面的潜力。具体而言，DreamTalk由三个关键组件构成：去噪网络、风格感知唇部专家和风格预测器。基于扩散的去噪网络能够持续合成由音频驱动、涵盖多种表情的高质量面部动作。为增强唇部运动的表现力和准确性，我们引入风格感知唇部专家，它在关注说话风格的同时引导唇形同步。为消除对表情参考视频或文本的需求，我们采用额外的基于扩散的风格预测器，直接从音频预测目标表情。通过这种方式，DreamTalk能够利用强大的扩散模型有效生成富有表现力的面部，并减少对昂贵风格参考的依赖。实验结果表明，DreamTalk能够生成具有多样说话风格的照片级真实感说话面部，并实现准确的唇部运动，超越了现有的最先进方法。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日