Training deep reinforcement learning (DRL) locomotion policies often require massive amounts of data to converge to the desired behaviour. In this regard, simulators provide a cheap and abundant source. For successful sim-to-real transfer, exhaustively engineered approaches such as system identification, dynamics randomization, and domain adaptation are generally employed. As an alternative, we investigate a simple strategy of random force injection (RFI) to perturb system dynamics during training. We show that the application of random forces enables us to emulate dynamics randomization. This allows us to obtain locomotion policies that are robust to variations in system dynamics. We further extend RFI, referred to as extended random force injection (ERFI), by introducing an episodic actuation offset. We demonstrate that ERFI provides additional robustness for variations in system mass offering on average a 53% improved performance over RFI. We also show that ERFI is sufficient to perform a successful sim-to-real transfer on two different quadrupedal platforms, ANYmal C and Unitree A1, even for perceptive locomotion over uneven terrain in outdoor environments.
翻译:训练深度强化学习运动策略通常需要大量数据才能收敛到期望行为。在这方面,模拟器提供了廉价且丰富的数据来源。为实现成功的仿真到现实迁移,通常采用系统辨识、动力学随机化和领域自适应等精心设计的方法。作为替代方案,我们研究了一种简单的随机力注入策略,用于在训练过程中扰动系统动力学。我们证明,施加随机力能够模拟动力学随机化效果,从而获得对系统动力学变化具有鲁棒性的运动策略。我们进一步扩展了随机力注入,提出了扩展随机力注入,通过引入逐幕次驱动偏移量。实验表明,扩展随机力注入对系统质量变化提供了额外鲁棒性,相较于随机力注入性能平均提升53%。我们还证明,扩展随机力注入足以在两个不同四足平台ANYmal C和Unitree A1上实现成功的仿真到现实迁移,即使在室外非平坦地形上的感知运动任务中同样有效。