Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To address this issue, we here propose PhysMoDPO, a Direct Preference Optimization framework. Unlike prior work that relies on hand-crafted physics-aware heuristics such as foot-sliding penalties, we integrate WBC into our training pipeline and optimize diffusion model such that the output of WBC becomes compliant both with physics and original text instructions. To train PhysMoDPO we deploy physics-based and task-specific rewards and use them to assign preference to synthesized trajectories. Our extensive experiments on text-to-motion and spatial control tasks demonstrate consistent improvements of PhysMoDPO in both physical realism and task-related metrics on simulated robots. Moreover, we demonstrate that PhysMoDPO results in significant improvements when applied to zero-shot motion transfer in simulation and for real-world deployment on a G1 humanoid robot.
翻译:近年来,文本条件人体运动生成领域的进展主要由基于大规模人体运动数据训练的扩散模型驱动。基于此进展,近期方法尝试通过应用全身控制器(WBC)将扩散生成的运动转换为可执行轨迹,从而将此类模型迁移至角色动画与真实机器人控制。虽然WBC轨迹变得符合物理规律,但它们可能与原始运动存在显著偏差。为解决此问题,本文提出PhysMoDPO——一个直接偏好优化框架。与先前依赖手工设计的物理感知启发式方法(如足部滑动惩罚)的工作不同,我们将WBC集成至训练流程中,并优化扩散模型,使得WBC的输出既符合物理规律又遵循原始文本指令。为训练PhysMoDPO,我们部署基于物理和任务特定的奖励函数,并利用它们为合成轨迹分配偏好。我们在文本到运动与空间控制任务上的大量实验表明,PhysMoDPO在仿真机器人的物理真实性与任务相关指标上均取得持续改进。此外,我们证明PhysMoDPO在仿真环境中的零样本运动迁移以及G1人形机器人的实际部署中均带来显著性能提升。