Micro-expressions are spontaneous, rapid and subtle facial movements that can neither be forged nor suppressed. They are very important nonverbal communication clues, but are transient and of low intensity thus difficult to recognize. Recently deep learning based methods have been developed for micro-expression (ME) recognition using feature extraction and fusion techniques, however, targeted feature learning and efficient feature fusion still lack further study according to the ME characteristics. To address these issues, we propose a novel framework Feature Representation Learning with adaptive Displacement Generation and Transformer fusion (FRL-DGT), in which a convolutional Displacement Generation Module (DGM) with self-supervised learning is used to extract dynamic features from onset/apex frames targeted to the subsequent ME recognition task, and a well-designed Transformer Fusion mechanism composed of three Transformer-based fusion modules (local, global fusions based on AU regions and full-face fusion) is applied to extract the multi-level informative features after DGM for the final ME prediction. The extensive experiments with solid leave-one-subject-out (LOSO) evaluation results have demonstrated the superiority of our proposed FRL-DGT to state-of-the-art methods.
翻译:微表情是自发的、快速且细微的面部运动,既无法伪造也无法抑制。它们是非语言交流中非常重要的线索,但由于其短暂性和低强度而难以识别。近年来,基于深度学习的方法通过特征提取与融合技术被用于微表情识别,然而,针对微表情特性的目标化特征学习与高效特征融合仍需进一步研究。为解决这些问题,我们提出了一种新颖框架——基于自适应位移生成与Transformer融合的特征表示学习(FRL-DGT)。该框架中,采用带自监督学习的卷积位移生成模块(DGM)从起始帧/顶点帧提取面向后续微表情识别任务的动态特征,并通过精心设计的Transformer融合机制(包含三个基于Transformer的融合模块:基于AU区域的局部融合、全局融合以及全脸融合)在DGM之后提取多层级信息特征,最终实现微表情预测。基于严谨的留一受试者交叉验证(LOSO)评估的广泛实验结果表明,我们提出的FRL-DGT方法优于现有最先进方法。