In-the-wild dynamic facial expression recognition (DFER) encounters a significant challenge in recognizing emotion-related expressions, which are often temporally and spatially diluted by emotion-irrelevant expressions and global context. Most prior DFER methods directly utilize coupled spatiotemporal representations that may incorporate weakly relevant features with emotion-irrelevant context bias. Several DFER methods highlight dynamic information for DFER, but following explicit guidance that may be vulnerable to irrelevant motion. In this paper, we propose a novel Implicit Facial Dynamics Disentanglement framework (IFDD). Through expanding wavelet lifting scheme to fully learnable framework, IFDD disentangles emotion-related dynamic information from emotion-irrelevant global context in an implicit manner, i.e., without exploit operations and external guidance. The disentanglement process contains two stages. The first is Inter-frame Static-dynamic Splitting Module (ISSM) for rough disentanglement estimation, which explores inter-frame correlation to generate content-aware splitting indexes on-the-fly. We utilize these indexes to split frame features into two groups, one with greater global similarity, and the other with more unique dynamic features. The second stage is Lifting-based Aggregation-Disentanglement Module (LADM) for further refinement. LADM first aggregates two groups of features from ISSM to obtain fine-grained global context features by an updater, and then disentangles emotion-related facial dynamic features from the global context by a predictor. Extensive experiments on in-the-wild datasets have demonstrated that IFDD outperforms prior supervised DFER methods with higher recognition accuracy and comparable efficiency. Code is available at https://github.com/CyberPegasus/IFDD.
翻译:野外动态面部表情识别面临一个重大挑战:情感相关表情常被情感无关表情及全局上下文在时空维度上稀释。大多数现有动态面部表情识别方法直接利用耦合的时空表征,可能引入弱相关特征并带有情感无关的上下文偏差。部分方法虽强调动态信息的重要性,但依赖显式引导机制,易受无关运动干扰。本文提出一种新颖的隐式面部动态解耦框架。通过将小波提升方案扩展为全可学习框架,该框架以隐式方式(即无需显式操作或外部引导)从情感无关的全局上下文中解耦出情感相关的动态信息。解耦过程包含两个阶段:首先是用于粗粒度解耦估计的帧间静态-动态分离模块,该模块通过探索帧间相关性动态生成内容感知的分离索引,并利用这些索引将帧特征划分为两组——一组具有更高的全局相似性,另一组则包含更多独特的动态特征;第二阶段是用于进一步细化的基于提升的聚合-解耦模块,该模块首先通过更新器聚合来自前一模块的两组特征以获得细粒度全局上下文特征,随后通过预测器从全局上下文中解耦出情感相关的面部动态特征。在多个野外数据集上的大量实验表明,该框架以更高的识别精度和相当的效率超越了现有有监督动态面部表情识别方法。代码公开于 https://github.com/CyberPegasus/IFDD。