MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of diverse languages. We collect a new multilingual 2D video dataset comprising over 420 hours of talking videos in 20 languages. With our proposed dataset, we present a multilingually enhanced model that incorporates language-specific style embeddings, enabling it to capture the unique mouth movements associated with each language. Additionally, we present a metric for assessing lip-sync accuracy in multilingual settings. We demonstrate that training a 3D talking head model with our proposed dataset significantly enhances its multilingual performance. Codes and datasets are available at https://multi-talk.github.io/.

翻译：近年来，语音驱动的3D说话头生成研究在言语发音方面已取得令人信服的成果。然而，当应用于其他语言的输入语音时，生成准确的唇形同步效果会下降，这可能是由于缺乏覆盖多语言广泛面部动作的数据集。在本工作中，我们提出了一项从多种语言语音生成3D说话头的新任务。我们收集了一个全新的多语言2D视频数据集，包含超过420小时、涵盖20种语言的说话视频。基于我们提出的数据集，我们提出了一种多语言增强模型，该模型融合了语言特定的风格嵌入，使其能够捕捉每种语言特有的嘴部运动。此外，我们还提出了一种用于评估多语言环境下唇形同步准确性的度量标准。我们证明，使用我们提出的数据集训练3D说话头模型能显著提升其多语言性能。代码和数据集可在 https://multi-talk.github.io/ 获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日