Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often suffer from physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. To address this, we introduce PHUMA, a Physically Reliable HUMAnoid locomotion dataset produced by a two-stage pipeline combining physics-aware curation and physics-constrained retargeting, aggregating both motion capture and internet video into a physically reliable, 73-hour corpus. On motion tracking benchmarks, PHUMA-trained policies achieve higher success rates than those trained on AMASS and Humanoid-X, and successfully transfer zero-shot to a real Unitree G1. The code is available at https://davian-robotics.github.io/PHUMA.
翻译:摘要:运动模仿是一种很有前景的人形机器人运动生成方法,能使智能体获取类人行为。现有方法通常依赖高质量运动捕捉数据集(如AMASS),但这些数据集稀缺且昂贵,限制了可扩展性和多样性。近期研究尝试通过转换大规模互联网视频来扩展数据收集规模,例如Humanoid-X。然而,这些数据常存在物理伪影(如悬浮、穿透和足部滑动),阻碍了稳定的运动模仿。为解决这一问题,我们提出了PHUMA——一个物理可靠的人形运动数据集,通过两阶段流水线(结合物理感知筛选与物理约束重定向)构建,将运动捕捉数据和互联网视频聚合为73小时的物理可靠语料库。在运动跟踪基准测试中,基于PHUMA训练的策略比基于AMASS和Humanoid-X训练的策略实现了更高的成功率,并成功零样本迁移到真实的Unitree G1机器人上。代码已开源:https://davian-robotics.github.io/PHUMA。