This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state. DRL is a promising approach to control robots for highly dynamic and feedback-based manoeuvres, and accurate simulators are crucial to providing cheap and abundant data to learn the desired behaviour. Nevertheless, the simulated data are noiseless and generally show a distributional shift that challenges the deployment on real machines where sensor readings are affected by noise. The standard solution is modelling the latter and injecting it during training; while this requires a thorough system identification, Roll-Drop enhances the robustness to sensor noise by tuning only a single parameter. We demonstrate an 80% success rate when up to 25% noise is injected in the observations, with twice higher robustness than the baselines. We deploy the controller trained in simulation on a Unitree A1 platform and assess this improved robustness on the physical system.
翻译:本文提出一种用于深度强化学习(DRL)的简单仿真到现实(sim-to-real)策略——名为Roll-Drop——通过在仿真过程中使用随机失活(dropout)来应对部署时的观测噪声,而无需显式建模每个状态的噪声分布。DRL是控制机器人执行高度动态且基于反馈的操作的一种有前景的方法,而精确的仿真器对于提供廉价且充足的数据以学习期望行为至关重要。然而,仿真数据无噪声且通常存在分布偏移,这给传感器读数受噪声影响的真实机器部署带来挑战。标准解决方案是对后者进行建模并在训练中注入噪声;虽然这需要彻底的系统辨识,但Roll-Drop通过仅调整单个参数即可增强对传感器噪声的鲁棒性。我们证明,当观测数据中注入高达25%的噪声时,该方法能达到80%的成功率,且鲁棒性比基线方法高两倍。我们在Unitree A1平台上部署了在仿真中训练的控制策略,并在物理系统上验证了这种增强的鲁棒性。