Micro-expression recognition is one of the most challenging topics in affective computing. It aims to recognize tiny facial movements difficult for humans to perceive in a brief period, i.e., 0.25 to 0.5 seconds. Recent advances in pre-training deep Bidirectional Transformers (BERT) have significantly improved self-supervised learning tasks in computer vision. However, the standard BERT in vision problems is designed to learn only from full images or videos, and the architecture cannot accurately detect details of facial micro-expressions. This paper presents Micron-BERT ($\mu$-BERT), a novel approach to facial micro-expression recognition. The proposed method can automatically capture these movements in an unsupervised manner based on two key ideas. First, we employ Diagonal Micro-Attention (DMA) to detect tiny differences between two frames. Second, we introduce a new Patch of Interest (PoI) module to localize and highlight micro-expression interest regions and simultaneously reduce noisy backgrounds and distractions. By incorporating these components into an end-to-end deep network, the proposed $\mu$-BERT significantly outperforms all previous work in various micro-expression tasks. $\mu$-BERT can be trained on a large-scale unlabeled dataset, i.e., up to 8 million images, and achieves high accuracy on new unseen facial micro-expression datasets. Empirical experiments show $\mu$-BERT consistently outperforms state-of-the-art performance on four micro-expression benchmarks, including SAMM, CASME II, SMIC, and CASME3, by significant margins. Code will be available at \url{https://github.com/uark-cviu/Micron-BERT}
翻译:微表情识别是情感计算中最具挑战性的课题之一。其目标是识别人类难以察觉的短暂面部微小运动,即0.25至0.5秒内的动作。近年来,预训练深度双向Transformer(BERT)的进展显著提升了计算机视觉中的自监督学习任务。然而,视觉问题中的标准BERT仅设计用于从完整图像或视频中学习,其架构无法准确检测面部微表情的细节。本文提出Micron-BERT($\mu$-BERT),一种新颖的面部微表情识别方法。该方法基于两个关键思想,能以无监督方式自动捕捉这些运动:首先,采用对角微注意力(DMA)检测两帧之间的细微差异;其次,引入新型兴趣补丁(PoI)模块,用于定位并突出微表情兴趣区域,同时减少噪声背景和干扰。通过将这些组件整合至端到端深度网络,所提出的$\mu$-BERT在多种微表情任务中显著优于此前所有方法。$\mu$-BERT可在大规模无标签数据集(即多达800万张图像)上训练,并在未见过的全新面部微表情数据集上实现高精度。实验结果表明,$\mu$-BERT在SAMM、CASME II、SMIC和CASME3四个微表情基准测试中,以显著优势持续超越当前最优性能。代码将发布于\url{https://github.com/uark-cviu/Micron-BERT}