Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity movement. It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment. However, the identification, differentiation, and understanding of micro-actions pose challenges due to the imperceptible and inaccessible nature of these subtle human behaviors in everyday life. In this study, we innovatively collect a new micro-action dataset designated as Micro-action-52 (MA-52), and propose a benchmark named micro-action network (MANet) for micro-action recognition (MAR) task. Uniquely, MA-52 provides the whole-body perspective including gestures, upper- and lower-limb movements, attempting to reveal comprehensive micro-action cues. In detail, MA-52 contains 52 micro-action categories along with seven body part labels, and encompasses a full array of realistic and natural micro-actions, accounting for 205 participants and 22,422 video instances collated from the psychological interviews. Based on the proposed dataset, we assess MANet and other nine prevalent action recognition methods. MANet incorporates squeeze-and excitation (SE) and temporal shift module (TSM) into the ResNet architecture for modeling the spatiotemporal characteristics of micro-actions. Then a joint-embedding loss is designed for semantic matching between video and action labels; the loss is used to better distinguish between visually similar yet distinct micro-action categories. The extended application in emotion recognition has demonstrated one of the important values of our proposed dataset and method. In the future, further exploration of human behaviour, emotion, and psychological assessment will be conducted in depth. The dataset and source code are released at https://github.com/VUT-HFUT/Micro-Action.

翻译：微动作是一种以低强度运动为特征的难以察觉的非言语行为。它能够揭示个体的情感和意图，对于情感识别和心理评估等以人为本的应用具有重要意义。然而，由于这些细微的人类行为在日常生活中难以察觉和获取，其识别、区分和理解面临着挑战。在本研究中，我们创新性地收集了一个名为 Micro-action-52 (MA-52) 的新微动作数据集，并提出了一个名为微动作网络（MANet）的基准，用于微动作识别（MAR）任务。独特的是，MA-52 提供了包括手势、上肢和下肢运动在内的全身视角，试图揭示全面的微动作线索。具体而言，MA-52 包含 52 个微动作类别以及七个身体部位标签，涵盖了从心理访谈中收集的、由 205 名参与者产生的 22,422 个视频实例所构成的一系列完整、真实且自然的微动作。基于所提出的数据集，我们评估了 MANet 以及其他九种流行的动作识别方法。MANet 将挤压激励（SE）模块和时序移位模块（TSM）融入 ResNet 架构，以建模微动作的时空特征。随后，设计了一个联合嵌入损失函数，用于视频与动作标签之间的语义匹配；该损失函数旨在更好地区分视觉上相似但本质不同的微动作类别。在情感识别中的扩展应用证明了我们提出的数据集和方法的重要价值之一。未来，我们将对人类行为、情感和心理评估进行更深入的探索。数据集和源代码发布于 https://github.com/VUT-HFUT/Micro-Action。