Understanding human motion beyond surface kinematics is crucial for motion analysis, rehabilitation, and injury risk assessment. However, progress in this domain is limited by the lack of large-scale datasets with biomechanical annotations, and by existing approaches that cannot directly infer internal biomechanical states from visual observations. In this paper, we introduce a simulation-based framework for estimating muscle activations from existing motion capture datasets, resulting in BioHuman10M, a large-scale dataset with synchronized video, motion, and activations. Building on BioHuman10M, we propose BioHuman, an end-to-end model that takes monocular video as input and jointly predicts human motion and muscle activations, effectively bridging visual observations and internal biomechanical states. Extensive experiments demonstrate that BioHuman enables accurate reconstruction of both kinematic motion and muscle activity, and generalizes across diverse subjects and motions. We believe our approach establishes a new benchmark for video-based biomechanical understanding and opens up new possibilities for physically grounded human modeling.
翻译:理解人体运动超出表面运动学范畴,对于运动分析、康复和损伤风险评估至关重要。然而,该领域的发展受到缺乏带有生物力学标注的大规模数据集,以及现有方法无法直接从视觉观测中推断内部生物力学状态的限制。在本文中,我们引入了一个基于模拟的框架,用于从现有的动作捕捉数据集中估计肌肉激活,从而生成了BioHuman10M,这是一个包含同步视频、运动和肌肉激活的大规模数据集。基于BioHuman10M,我们提出了BioHuman,这是一个端到端的模型,它以单目视频为输入,联合预测人体运动和肌肉激活,有效弥合了视觉观测与内部生物力学状态之间的鸿沟。大量实验表明,BioHuman能够实现运动学运动和肌肉活动的精确重建,并在不同受试者和动作间具有良好的泛化能力。我们相信,我们的方法为基于视频的生物力学理解建立了新的基准,并为基于物理的人体建模开辟了新的可能性。