Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly mapping single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity metric is defined to distinguish between strong and weak facial activity, obtained by computing the short-time Fourier transform of facial vertex displacements. Based on the variances in facial activity, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/
翻译:语音驱动的三维面部动画是一项具有挑战性的跨模态任务,已吸引了越来越多的研究兴趣。在说话过程中,嘴部表现出强烈的运动,而其他面部区域通常显示出相对较弱的活动水平。现有方法通常通过将单层次语音特征直接映射到整个面部动画来简化这一过程,忽视了面部活动强度的差异,导致面部运动过于平滑。在本研究中,我们提出了一种新颖的框架CorrTalk,该框架有效建立了分层语音特征与不同强度面部活动在不同区域间的时序相关性。通过计算面部顶点位移的短时傅里叶变换,定义了一种新的面部活动强度度量,以区分强、弱面部活动。基于面部活动的差异,我们提出了一种双分支解码框架,用于同步合成强、弱面部活动,从而保证了更宽强度范围的面部动画合成。此外,提出了一种加权层次特征编码器,以建立分层语音特征与不同强度面部活动之间的时序相关性,确保唇形同步与合理面部表情。大量定性和定量实验以及用户研究表明,我们的CorrTalk优于现有最先进方法。源代码和补充视频已公开于:https://zjchu.github.io/projects/CorrTalk/