A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
翻译:一种教授人形机器人复杂技能的主流范式是将人体运动重定向为运动学参考,以训练强化学习(RL)策略。然而,现有重定向流程常因人与机器人之间显著的本体差异而产生足部滑动、穿透等物理上不合理的伪影。更重要的是,常见重定向方法忽略了对于表达性运动与运动操控至关重要的丰富的人-物及人-环境交互。为此,我们提出OmniRetarget——一种基于交互网格的交互保持数据生成引擎,该网格显式建模并保持智能体、地形及被操控物体之间关键的空间与接触关系。通过最小化人与机器人网格之间的拉普拉斯变形并施加运动学约束,OmniRetarget生成运动学可行的轨迹。此外,保持任务相关交互可实现高效的数据增强,将单一示教推广至不同机器人本体、地形及物体配置。我们通过将来自OMOMO、LAFAN1及自采MoCap数据集的动作重定向,全面评估了OmniRetarget,生成了超过8小时的轨迹,其在运动学约束满足度和接触保持方面优于广泛使用的基线方法。此类高质量数据使基于本体感觉的RL策略仅需5个奖励项及所有任务共享的简单域随机化(无需任何学习课程),即可在宇树G1人形机器人上成功执行长达30秒的长时间跨度跑酷与运动操控技能。