In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying currently activated muscular regions of humans performing a specific activity. Video-based AMGE is an important yet overlooked problem. To this intent, we provide the MuscleMap136 featuring >15K video clips with 136 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabilitation medicine. We further complement the main MuscleMap136 dataset, which specifically targets physical exercise, with Muscle-UCF90 and Muscle-HMDB41, which are new variants of the well-known activity recognition benchmarks extended with AMGE annotations. With MuscleMap136, we discover limitations of state-of-the-art architectures for human activity recognition when dealing with multi-label muscle annotations and good generalization to unseen activities is required. To address this, we propose a new multimodal transformer-based model, TransM3E, which surpasses current activity recognition models for AMGE, especially as it comes to dealing with previously unseen activities. The datasets and code will be publicly available at https://github.com/KPeng9510/MuscleMap.
翻译:本文提出了一项新任务——基于视频的激活肌群估计(AMGE),旨在识别执行特定活动时人体当前被激活的肌肉区域。视频AMGE是一个重要但长期被忽视的问题。为此,我们构建了MuscleMap136数据集,包含超过1.5万个视频片段,覆盖136种不同活动及20个标注肌群。该数据集为体育与康复医学中的多种视频应用开辟了新视角。我们进一步补充了主要聚焦于体育锻炼的MuscleMap136数据集,并扩展出Muscle-UCF90和Muscle-HMDB41两个新变体——它们是基于知名行为识别基准加入AMGE标注的扩展版本。通过MuscleMap136,我们发现当前先进的人体行为识别架构在处理多标签肌肉标注和泛化至未见活动时存在局限性。为解决这一问题,我们提出了一种基于多模态Transformer的新模型TransM3E,其在AMGE任务上超越了现有行为识别模型,尤其在处理未见活动方面表现突出。数据集与代码将公开于https://github.com/KPeng9510/MuscleMap。