Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, typically lasting less than half a second. Recognizing these subtle expressions is critical for applications in psychology, security, and behavioral analysis. Although deep learning has enabled significant advances in micro-expression recognition (MER), its effectiveness is limited by the scarcity of annotated ME datasets. This data limitation not only hinders generalization but also restricts the diversity of motion patterns captured during training. Existing MER studies predominantly rely on simple spatial augmentations (e.g., flipping, rotation) and overlook temporal augmentation strategies that can better exploit motion characteristics. To address this gap, this paper proposes a phase-aware temporal augmentation method based on dynamic image. Rather than encoding the entire expression as a single onset-to-offset dynamic image (DI), our approach decomposes each expression sequence into two motion phases: onset-to-apex and apex-to-offset. A separate DI is generated for each phase, forming a Dual-phase DI augmentation strategy. These phase-specific representations enrich motion diversity and introduce complementary temporal cues that are crucial for recognizing subtle facial transitions. Extensive experiments on CASME-II and SAMM datasets using six deep architectures, including CNNs, Vision Transformer, and the lightweight LEARNet, demonstrate consistent performance improvements in recognition accuracy, unweighted F1-score, and unweighted average recall, which are crucial for addressing class imbalance in MER. When combined with spatial augmentations, our method achieves up to a 10\% relative improvement. The proposed augmentation is simple, model-agnostic, and effective in low-resource settings, offering a promising direction for robust and generalizable MER.
翻译:微表情(MEs)是短暂且无意识的面部动作,通常持续时间不足半秒,能够揭示真实情感。识别这些细微表情对于心理学、安防和行为分析等应用至关重要。尽管深度学习已在微表情识别(MER)领域取得显著进展,但其效果受限于带标注微表情数据集的稀缺性。这种数据限制不仅阻碍了模型的泛化能力,还限制了训练过程中捕获的运动模式多样性。现有MER研究主要依赖简单的空间增强方法(如翻转、旋转),而忽视了能更好利用运动特性的时间增强策略。为弥补这一不足,本文提出一种基于动态图像的相位感知时间增强方法。与将整个表情编码为单一"起始-终止"动态图像(DI)的传统方式不同,我们的方法将每个表情序列分解为两个运动相位:起始至峰值和峰值至终止。针对每个相位分别生成独立的动态图像,形成双相位动态图像增强策略。这些相位特异性表征丰富了运动多样性,并引入了对识别细微面部转换至关重要的互补时间线索。在CASME-II和SAMM数据集上使用六种深度架构(包括CNN、Vision Transformer及轻量级LEARNet)进行的大量实验表明,该方法在识别准确率、未加权F1分数和未加权平均召回率等关键指标上均取得持续提升,这些指标对于解决MER中的类别不平衡问题至关重要。当与空间增强方法结合时,本方法可实现高达10%的相对性能提升。所提出的增强方法具有简洁性、模型无关性,在低资源环境下表现优异,为构建鲁棒且可泛化的MER系统提供了有前景的研究方向。