Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We evaluated PHUMA in two sets of conditions: (i) imitation of unseen motion from self-recorded test videos and (ii) path following with pelvis-only guidance. In both cases, PHUMA-trained policies outperform Humanoid-X and AMASS, achieving significant gains in imitating diverse motions. The code is available at https://davian-robotics.github.io/PHUMA.
翻译:运动模仿是实现人形机器人运动的一种有效方法,使智能体能够习得类人行为。现有方法通常依赖高质量的运动捕捉数据集(如AMASS),但这些数据稀缺且昂贵,限制了可扩展性和多样性。近期研究尝试通过转换大规模互联网视频(以Humanoid-X为代表)来扩展数据收集规模,但常引入漂浮、穿透和足部滑动等物理伪影,阻碍了稳定模仿。为此,我们提出了PHUMA(基于物理的人形机器人运动数据集),该数据集利用大规模人类视频,同时通过精细的数据处理和物理约束的重定向技术解决物理伪影问题。PHUMA强制执行关节限制、确保地面接触并消除足部滑动,生成兼具大规模与物理可靠性的运动数据。我们在两种条件下评估了PHUMA:(i)对自录制测试视频中未见运动的模仿;(ii)仅通过骨盆引导的路径跟随。两种情况下,基于PHUMA训练的策略均优于Humanoid-X和AMASS,在模仿多样化运动方面取得显著提升。代码发布于https://davian-robotics.github.io/PHUMA。