Facial micro-expressions, characterized by their subtle and brief nature, are valuable indicators of genuine emotions. Despite their significance in psychology, security, and behavioral analysis, micro-expression recognition remains challenging due to the difficulty of capturing subtle facial movements. Optical flow has been widely employed as an input modality for this task due to its effectiveness. However, most existing methods compute optical flow only between the onset and apex frames, thereby overlooking essential motion information in the apex-to-offset phase. To address this limitation, we first introduce a comprehensive motion representation, termed Magnitude-Modulated Combined Optical Flow (MM-COF), which integrates motion dynamics from both micro-expression phases into a unified descriptor suitable for direct use in recognition networks. Building upon this principle, we then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules. This allows the network to adaptively fuse motion cues and focus on salient facial regions for classification. Experimental evaluations on the MMEW, SMIC, CASME-II, and SAMM datasets, widely recognized as standard benchmarks, demonstrate that our proposed MM-COF representation and FMANet outperforms existing methods, underscoring the potential of a learnable, dual-phase framework in advancing micro-expression recognition.
翻译:面部微表情以其微妙和短暂的特征,是真实情绪的重要指标。尽管微表情在心理学、安全及行为分析中具有重要意义,但由于难以捕捉细微的面部运动,微表情识别仍然具有挑战性。光流因其有效性已被广泛用作该任务的输入模态。然而,现有方法大多仅计算起始帧与峰值帧之间的光流,从而忽略了峰值至消退阶段的关键运动信息。为克服这一局限,我们首先引入了一种全面的运动表示方法,称为幅度调制组合光流(MM-COF),该方法将两个微表情阶段的运动动态整合为一个统一的描述符,适用于直接输入识别网络。基于此原理,我们进一步提出了FMANet,一种新颖的端到端神经网络架构,它将双阶段分析和幅度调制内化为可学习模块。这使得网络能够自适应地融合运动线索并聚焦于显著的面部区域进行分类。在MMEW、SMIC、CASME-II和SAMM数据集(被广泛认可为标准基准)上的实验评估表明,我们提出的MM-COF表示和FMANet优于现有方法,突显了可学习的双阶段框架在推进微表情识别方面的潜力。