This paper proposes a novel alternative to existing sim-to-real methods for training control policies with simulated experiences. Unlike prior methods that typically rely on domain randomization over a fixed finite set of parameters, the proposed approach injects state-dependent perturbations into the input joint torque during forward simulation. These perturbations are designed to simulate a broader spectrum of reality gaps than standard parameter randomization without requiring additional training. By using neural networks as flexible perturbation generators, the proposed method can represent complex, state-dependent uncertainties, such as nonlinear actuator dynamics and contact compliance, that parametric randomization cannot capture. Experimental results demonstrate that the proposed approach enables humanoid locomotion policies to achieve superior robustness against complex, unseen reality gaps in both simulation and real-world deployment.
翻译:本文提出了一种新颖的仿真到现实迁移方法,用于通过仿真经验训练控制策略,替代现有方法。与先前依赖在固定有限参数集上进行域随机化的方法不同,本方法在前向仿真过程中向输入关节力矩注入状态依赖的扰动。这些扰动旨在模拟比标准参数随机化更广泛范围的现实差距,且无需额外训练。通过使用神经网络作为灵活的扰动生成器,本方法能够表示复杂的、状态依赖的不确定性(如非线性执行器动力学和接触柔度),而这些是参数化随机化无法捕捉的。实验结果表明,本方法使人形机器人运动策略在仿真和实际部署中均能对复杂、未知的现实差距展现出卓越的鲁棒性。