This paper proposes a novel alternative to existing sim-to-real methods for training control policies with simulated experiences. Unlike prior methods that typically rely on domain randomization over a fixed finite set of parameters, the proposed approach injects state-dependent perturbations into the input joint torque during forward simulation. These perturbations are designed to simulate a broader spectrum of reality gaps than standard parameter randomization without requiring additional training. By using neural networks as flexible perturbation generators, the proposed method can represent complex, state-dependent uncertainties, such as nonlinear actuator dynamics and contact compliance, that parametric randomization cannot capture. Experimental results demonstrate that the proposed approach enables humanoid locomotion policies to achieve superior robustness against complex, unseen reality gaps in both simulation and real-world deployment.
翻译:本文提出一种新颖的替代现有仿真到现实(Sim-to-Real)方法,用于通过仿真经验训练控制策略。与以往依赖固定有限参数集进行域随机化的方法不同,本方法在前向仿真过程中向输入关节力矩注入状态依赖的扰动。这些扰动旨在模拟比标准参数随机化更广泛的实际差距,且无需额外训练。通过使用神经网络作为灵活的扰动生成器,所提方法能够表达参数随机化无法捕捉的复杂、状态依赖的不确定性,例如非线性作动器动力学和接触柔顺性。实验结果表明,本方法使人体运动策略在仿真和实际部署中均能对复杂、未知的实际差距展现出卓越的鲁棒性。