Continuously Controllable Facial Expression Editing in Talking Face Videos

from arxiv, Accepted by IEEE Transactions on Affective Computing (DOI: 10.1109/TAFFC.2023.3334511). Demo video: https://youtu.be/WD-bNVya6kM . Project page: https://raineggplant.github.io/FEE4TV

Recently audio-driven talking face video generation has attracted considerable attention. However, very few researches address the issue of emotional editing of these talking face videos with continuously controllable expressions, which is a strong demand in the industry. The challenge is that speech-related expressions and emotion-related expressions are often highly coupled. Meanwhile, traditional image-to-image translation methods cannot work well in our application due to the coupling of expressions with other attributes such as poses, i.e., translating the expression of the character in each frame may simultaneously change the head pose due to the bias of the training data distribution. In this paper, we propose a high-quality facial expression editing method for talking face videos, allowing the user to control the target emotion in the edited video continuously. We present a new perspective for this task as a special case of motion information editing, where we use a 3DMM to capture major facial movements and an associated texture map modeled by a StyleGAN to capture appearance details. Both representations (3DMM and texture map) contain emotional information and can be continuously modified by neural networks and easily smoothed by averaging in coefficient/latent spaces, making our method simple yet effective. We also introduce a mouth shape preservation loss to control the trade-off between lip synchronization and the degree of exaggeration of the edited expression. Extensive experiments and a user study show that our method achieves state-of-the-art performance across various evaluation criteria.

翻译：近年来，音频驱动的说话人脸视频生成技术受到了广泛关注。然而，针对这类视频进行情绪编辑并实现持续可控表情的研究却十分有限，而这一需求在工业界十分迫切。其挑战在于，与语音相关的表情和与情绪相关的表情往往高度耦合。同时，传统的图像到图像翻译方法在本应用中难以奏效，因为表情与其他属性（如姿态）存在耦合，即翻译每一帧中角色的表情时，可能因训练数据分布偏差而同时改变头部姿态。本文提出了一种适用于说话人脸视频的高质量面部表情编辑方法，使用户能够持续控制编辑视频中的目标情绪。我们以运动信息编辑的特殊情况为切入点，提出了一种新视角：利用3DMM捕捉主要面部运动，并通过StyleGAN建模的关联纹理图捕捉外观细节。这两种表示（3DMM和纹理图）均包含情绪信息，可通过神经网络进行持续修改，并能在系数/潜在空间中通过平均化轻松实现平滑处理，从而使我们的方法既简单又有效。我们还引入了一种口型保持损失，以控制唇部同步与编辑表情夸张程度之间的权衡。大量实验和用户研究证明，我们的方法在各项评估标准上均达到了最先进水平。

相关内容

Continuity

关注 0

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日