With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision. While a variety of new tasks and algorithms have been proposed recently, there are growing hunger for data resources involved in high-quality data, fine-grained labels, and diverse environments. In this paper, we present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories. FLAG3D features the following three aspects: 1) accurate and dense 3D human pose captured from advanced MoCap system to handle the complex activity and large movement, 2) detailed and professional language instruction to describe how to perform a specific activity, 3) versatile video resources from a high-tech MoCap system, rendering software, and cost-effective smartphones in natural environments. Extensive experiments and in-depth analysis show that FLAG3D contributes great research value for various challenges, such as cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation. Our dataset and source code are publicly available at https://andytang15.github.io/FLAG3D.
翻译:随着健身活动在全球范围内的持续普及,健身活动分析已成为计算机视觉领域新兴的研究课题。尽管近期已提出多种新任务和新算法,但对高质量数据、细粒度标注及多样化环境等数据资源的需求日益增长。本文提出了FLAG3D——一个包含18万条序列、涵盖60个类别的带语言指令的大规模三维健身活动数据集。FLAG3D具有以下三个特点:1)采用先进动作捕捉系统获取精确密集的三维人体姿态,以处理复杂活动和大范围运动;2)提供详细专业的语言指令,描述特定活动的执行方法;3)包含来自高科技动作捕捉系统、渲染软件以及自然环境中高性价比智能手机的多功能视频资源。大量实验和深入分析表明,FLAG3D为跨域人体动作识别、动态人体网格恢复以及语言引导的人体动作生成等各类挑战提供了重要研究价值。我们的数据集和源代码已在https://andytang15.github.io/FLAG3D 公开提供。