A New Multi-Domain Benchmark for Micro-Action Recognition and Detection

Micro-actions are short-duration, low-amplitude subtle body movements at the whole-body level that can reveal latent intentions, involuntary reactions, and fine-grained affective changes. Our previous MA-52 benchmark has provided an important foundation for micro-action recognition, but it remains limited in scale, scene diversity, task coverage, and evaluation protocols. To advance micro-action analysis toward more realistic and comprehensive settings, we introduce MMA-82, a large-scale multi-domain extension of MA-52. MMA-82 expands the label space from 52 to 82 fine-grained micro-action categories and covers four distinct domains, including laboratory interviews, street interviews, psychiatric patient interviews, and emotion-rich television videos, resulting in 77,856 annotated instances from 454 subjects. Built upon MMA-82, we establish two core tasks: Micro-Action Recognition and Multi-label Micro-Action Detection. For recognition, we further define in-domain and cross-domain protocols, including few-shot and zero-shot settings, to evaluate model robustness, transferability, and generalization. Extensive experiments show that current methods still struggle with realistic micro-action understanding, especially under domain shift, long-tailed category distributions, and complex temporal localization. Beyond benchmarking, we investigate the relationship between micro-actions and emotion, showing that micro-actions are strongly associated with emotional states and provide complementary cues to facial micro-expressions for improved emotion recognition. These results demonstrate that MMA-82 serves as a comprehensive and challenging benchmark for realistic micro-action analysis and a valuable resource for human-centered AI. MMA-82 is available at https://lpynow.github.io/MMA-82-AIM/.

翻译：微动作是在全身层面发生的、持续时间短、幅度低的微妙身体运动，能够揭示潜在意图、无意识反应以及细粒度的情感变化。我们之前的MA-52基准为微动作识别提供了重要基础，但其在规模、场景多样性、任务覆盖范围和评估协议方面仍存在局限。为了推动微动作分析向更真实和更全面的场景发展，我们引入了MMA-82，这是MA-52的大规模多域扩展。MMA-82将标签空间从52类扩展到82类细粒度微动作类别，涵盖四个不同的领域，包括实验室访谈、街头访谈、精神病患者访谈和情感丰富的电视视频，最终从454名受试者中获得了77,856个标注实例。基于MMA-82，我们确立了两个核心任务：微动作识别和多标签微动作检测。对于识别任务，我们进一步定义了域内和跨域协议，包括少样本和零样本设置，以评估模型的鲁棒性、迁移性和泛化能力。大量实验表明，当前方法在真实场景的微动作理解上仍存在困难，尤其是在域偏移、长尾类别分布和复杂时间定位的情况下。除了基准测试，我们还研究了微动作与情感之间的关系，表明微动作与情感状态密切相关，并为面部微表情提供了补充线索，以改进情感识别。这些结果表明，MMA-82为真实场景下的微动作分析提供了一个全面且具有挑战性的基准，也是人本人工智能的宝贵资源。MMA-82可在https://lpynow.github.io/MMA-82-AIM/获取。