Micro-expressions are typically regarded as unconscious manifestations of a person's genuine emotions. However, their short duration and subtle signals pose significant challenges for downstream recognition. We propose a multi-task learning framework named the Adaptive Motion Magnification and Sparse Mamba (AMMSM) to address this. This framework aims to enhance the accurate capture of micro-expressions through self-supervised subtle motion magnification, while the sparse spatial selection Mamba architecture combines sparse activation with the advanced Visual Mamba model to model key motion regions and their valuable representations more effectively. Additionally, we employ evolutionary search to optimize the magnification factor and the sparsity ratios of spatial selection, followed by fine-tuning to improve performance further. Extensive experiments on two standard datasets demonstrate that the proposed AMMSM achieves state-of-the-art (SOTA) accuracy and robustness.
翻译:微表情通常被视为个体真实情感的无意识表现。然而,其持续时间短、信号微弱的特性给下游识别任务带来了巨大挑战。为此,我们提出了一种名为自适应运动放大与稀疏Mamba(AMMSM)的多任务学习框架。该框架旨在通过自监督的细微运动放大来增强对微表情的精准捕捉,同时,稀疏空间选择Mamba架构将稀疏激活与先进的Visual Mamba模型相结合,以更有效地建模关键运动区域及其有价值的表征。此外,我们采用进化搜索来优化放大因子和空间选择的稀疏比率,并通过微调进一步提升性能。在两个标准数据集上的大量实验表明,所提出的AMMSM方法在准确性和鲁棒性方面均达到了最先进的水平。