In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying active muscle regions during physical activity in the wild. To this intent, we provide the MuscleMap dataset featuring >15K video clips with 135 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabilitation medicine under flexible environment constraints. The proposed MuscleMap dataset is constructed with YouTube videos, specifically targeting High-Intensity Interval Training (HIIT) physical exercise in the wild. To make the AMGE model applicable in real-life situations, it is crucial to ensure that the model can generalize well to numerous types of physical activities not present during training and involving new combinations of activated muscles. To achieve this, our benchmark also covers an evaluation setting where the model is exposed to activity types excluded from the training set. Our experiments reveal that the generalizability of existing architectures adapted for the AMGE task remains a challenge. Therefore, we also propose a new approach, TransM3E, which employs a multi-modality feature fusion mechanism between both the video transformer model and the skeleton-based graph convolution model with novel cross-modal knowledge distillation executed on multi-classification tokens. The proposed method surpasses all popular video classification models when dealing with both, previously seen and new types of physical activities. The contributed dataset and code will be publicly available at https://github.com/KPeng9510/MuscleMap.
翻译:本文探讨基于视频的激活肌群估计这一新任务,旨在识别野外身体活动中的活跃肌肉区域。为此,我们构建了MuscleMap数据集,包含超过15,000个视频片段,涵盖135种不同活动及20个标注肌群。该数据集为灵活环境约束下的运动医学与康复医学中多种基于视频的应用开辟了新前景。所提出的MuscleMap数据集基于YouTube视频构建,特别针对野外高强度间歇训练(HIIT)体育活动。为使激活肌群估计模型适用于实际场景,关键在于确保模型能良好泛化至训练时未出现且涉及新激活肌群组合的多种体力活动类型。为此,我们的基准评测还涵盖一种评估设定,即模型需处理训练集未包含的活动类型。实验表明,现有架构在适配激活肌群估计任务时的泛化能力仍具挑战性。因此,我们提出新方法TransM3E,该方法通过视频Transformer模型与基于骨架的图卷积模型之间的多模态特征融合机制,结合针对多分类令牌的新型跨模态知识蒸馏技术。在处理已知与新型体力活动时,本方法均超越所有主流视频分类模型。所贡献的数据集与代码将公开于https://github.com/KPeng9510/MuscleMap。