Micro-expressions (MEs) are involuntary and subtle facial expressions that are thought to reveal feelings people are trying to hide. ME spotting detects the temporal intervals containing MEs in videos. Detecting such quick and subtle motions from long videos is difficult. Recent works leverage detailed facial motion representations, such as the optical flow, and deep learning models, leading to high computational complexity. To reduce computational complexity and achieve real-time operation, we propose RMES, a real-time ME spotting framework. We represent motion using phase computed by Riesz Pyramid, and feed this motion representation into a three-stream shallow CNN, which predicts the likelihood of each frame belonging to an ME. In comparison to optical flow, phase provides more localized motion estimates, which are essential for ME spotting, resulting in higher performance. Using phase also reduces the required computation of the ME spotting pipeline by 77.8%. Despite its relative simplicity and low computational complexity, our framework achieves state-of-the-art performance on two public datasets: CAS(ME)2 and SAMM Long Videos.
翻译:微表情(ME)是一种不自主的、细微的面部表情,被认为能揭示人们试图隐藏的情感。ME检测旨在从视频中定位包含微表情的时间区间。从长视频中检测此类快速且细微的运动十分困难。近期研究依赖于光流等精细的面部运动表示以及深度学习模型,导致计算复杂度较高。为降低计算复杂度并实现实时运行,我们提出了实时ME检测框架RMES。该方法利用Riesz金字塔计算的相位表示运动,并将该运动表示输入至三流浅层卷积神经网络,以预测每帧属于微表情的概率。与光流相比,相位能提供更局部的运动估计——这对微表情检测至关重要,从而获得更高性能。采用相位表示还将ME检测流程的计算量降低了77.8%。尽管框架相对简单且计算复杂度低,但在两个公开数据集CAS(ME)²和SAMM长视频上达到了最先进水平。