Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.
翻译:在仿真环境中训练的人形机器人全身控制策略常受仿真与现实差距的影响,这一差距从根本上源于模拟器的归纳偏差——即任何单一模拟器固有的假设与局限性。这些偏差不仅导致不同模拟器之间产生显著差异,也造成仿真环境与现实世界之间的不一致。为减轻模拟器归纳偏差的影响,核心思路是在多个模拟器中联合训练策略,促使学习到的控制器能够捕捉超越任何单一模拟器假设的通用动力学特性。为此,我们提出PolySim——一个集成多种异构模拟器的全身控制训练平台。PolySim能在单次训练运行中同时启动来自不同引擎的并行环境,从而实现动力学层面的域随机化。理论上,我们证明PolySim相比单一模拟器训练能为模拟器归纳偏差提供更严格的上界。实验中,PolySim在仿真间评估中显著降低了运动跟踪误差;例如在MuJoCo上,其执行成功率较IsaacSim基线提升52.8%。PolySim进一步实现了在真实Unitree G1机器人上的零样本部署,无需额外微调即展现出从仿真到现实世界的有效迁移。我们将在本文被接受后开源PolySim代码。