4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length. Meanwhile, we propose a temporal coherence loss to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements. Furthermore, we designed a Multi-level Identity-Aware Displacement Network based on a cross-attention mechanism to reconstruct the 4D facial expression sequences from landmark sequences. Finally, our FC-4DFS achieves flexible and SOTA generation results of 4D facial expression sequences with different lengths on CoMA and Florence4D datasets. The code will be available on GitHub.
翻译:四维面部表情合成是计算机视觉与图形学领域的一个关键问题。现有方法在模拟表情序列的帧间运动时缺乏灵活性与平滑性。本文提出一种频率控制的四维面部表情合成方法FC-4DFS。具体而言,我们引入一种频率控制的长短期记忆网络,以从给定长度的中性特征点逐帧生成四维面部表情序列。同时,我们提出一种时序一致性损失函数,以增强对时序序列运动的感知并提升相对位移的准确性。此外,我们设计了一种基于交叉注意力机制的多层级身份感知位移网络,用于从特征点序列重建四维面部表情序列。最终,我们的FC-4DFS在CoMA和Florence4D数据集上实现了对不同长度四维面部表情序列的灵活且达到当前最优水平的生成效果。代码将在GitHub上开源。