Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity movement. It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment. However, the identification, differentiation, and understanding of micro-actions pose challenges due to the imperceptible and inaccessible nature of these subtle human behaviors in everyday life. In this study, we innovatively collect a new micro-action dataset designated as Micro-action-52 (MA-52), and propose a benchmark named micro-action network (MANet) for micro-action recognition (MAR) task. Uniquely, MA-52 provides the whole-body perspective including gestures, upper- and lower-limb movements, attempting to reveal comprehensive micro-action cues. In detail, MA-52 contains 52 micro-action categories along with seven body part labels, and encompasses a full array of realistic and natural micro-actions, accounting for 205 participants and 22,422 video instances collated from the psychological interviews. Based on the proposed dataset, we assess MANet and other nine prevalent action recognition methods. MANet incorporates squeeze-and excitation (SE) and temporal shift module (TSM) into the ResNet architecture for modeling the spatiotemporal characteristics of micro-actions. Then a joint-embedding loss is designed for semantic matching between video and action labels; the loss is used to better distinguish between visually similar yet distinct micro-action categories. The extended application in emotion recognition has demonstrated one of the important values of our proposed dataset and method. In the future, further exploration of human behaviour, emotion, and psychological assessment will be conducted in depth. The dataset and source code are released at https://github.com/VUT-HFUT/Micro-Action.

翻译：微动作是一种不易察觉的非言语行为，其特征为低强度运动，能够揭示个体的感受与意图，在面向人类的应用中（如情感识别与心理评估）具有重要意义。然而，由于这类微妙人类行为在日常生活中的不易察觉与难以获取特性，对其识别、区分与理解构成挑战。本研究创新性地收集了名为微动作-52（MA-52）的新微动作数据集，并提出名为微动作网络（MANet）的基准，用于微动作识别（MAR）任务。独特之处在于，MA-52提供了包含手势、上肢与下肢运动的全身视角，旨在揭示全面的微动作线索。具体而言，MA-52包含52个微动作类别及七个人体部位标签，涵盖一系列真实自然的微动作，涉及205名参与者，共整理出22,422个来自心理访谈的视频实例。基于所提出的数据集，我们评估了MANet及其他九种主流动作识别方法。MANet将挤压激励（SE）与时间移位模块（TSM）融入ResNet架构，以建模微动作的时空特征。随后，设计联合嵌入损失用于视频与动作标签间的语义匹配；该损失有助于更好地区分视觉相似但不同的微动作类别。在情感识别中的应用扩展展示了我们提出的数据集与方法的重大价值之一。未来，将对人类行为、情感与心理评估进行更深入的探索。数据集与源代码发布在https://github.com/VUT-HFUT/Micro-Action。