In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying active muscle regions during physical activity. To this intent, we provide the MuscleMap136 dataset featuring >15K video clips with 136 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabilitation medicine. We further complement the main MuscleMap136 dataset, which specifically targets physical exercise, with Muscle-UCF90 and Muscle-HMDB41, which are new variants of the well-known activity recognition benchmarks extended with AMGE annotations. To make the AMGE model applicable in real-life situations, it is crucial to ensure that the model can generalize well to types of physical activities not present during training and involving new combinations of activated muscles. To achieve this, our benchmark also covers an evaluation setting where the model is exposed to activity types excluded from the training set. Our experiments reveal that generalizability of existing architectures adapted for the AMGE task remains a challenge. Therefore, we also propose a new approach, TransM3E, which employs a transformer-based model with cross-modal multi-label knowledge distillation and surpasses all popular video classification models when dealing with both, previously seen and new types of physical activities. The datasets and code will be publicly available at https://github.com/KPeng9510/MuscleMap.
翻译:摘要:本文提出了一个新的视频任务——激活肌群估计(AMGE),旨在识别体育活动中的活跃肌肉区域。为此,我们构建了MuscleMap136数据集,包含超过1.5万个视频片段,涵盖136种不同活动和20个标注的肌肉群。该数据集为体育和康复医学中的多类基于视频的应用开辟了新的视角。我们进一步补充了主要针对体育锻炼的MuscleMap136数据集,新增了Muscle-UCF90和Muscle-HMDB41两个变体,这些是基于经典活动识别基准扩展AMGE标注的新版本。为使AMGE模型适用于实际场景,关键是要确保模型能很好地泛化到训练中未出现的体育活动类型及涉及的新激活肌肉组合。为此,我们的基准测试还包含了一种评估设置,即模型需接触训练集中排除的活动类型。实验表明,现有架构在适应AMGE任务时仍存在泛化性挑战。因此,我们提出了一种新方法TransM3E,它采用基于Transformer的模型结合跨模态多标签知识蒸馏,在处理已知和新类型体育活动时均优于所有主流视频分类模型。数据集和代码将在https://github.com/KPeng9510/MuscleMap公开。