Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate speech and authentic laughter. Our newly curated dataset comprises 2D laughing videos paired with pseudo-annotated and human-validated 3D FLAME parameters and vertices. Given our proposed dataset, we present a strong baseline with a two-stage training scheme: the model first learns to talk and then acquires the ability to express laughter. Extensive experiments demonstrate that our method performs favorably compared to existing approaches in both talking head generation and expressing laughter signals. We further explore potential applications on top of our proposed method for rigging realistic avatars.
翻译:笑声是一种独特的表达方式,对于人类积极的社交互动至关重要。尽管当前的三维说话头生成方法能够产生令人信服的口语发音,但在捕捉笑声和微笑的生命力与细微之处方面往往失败,尽管这些在社会语境中非常重要。本文提出了一项新任务:生成既能清晰说话又能真实表达笑声的三维说话头。我们新整理的数据集包含二维笑视频,并附有伪标注及人工验证的三维FLAME参数与顶点。基于所提出的数据集,我们采用两阶段训练方案建立了一个强基线模型:模型首先学习说话,随后获得表达笑声的能力。大量实验表明,我们的方法在说话头生成和笑声信号表达方面均优于现有方法。我们进一步探索了基于所提方法用于绑定逼真虚拟角色的潜在应用。