Sim-to-real is a mainstream method to cope with the large number of trials needed by typical deep reinforcement learning methods. However, transferring a policy trained in simulation to actual hardware remains an open challenge due to the reality gap. In particular, the characteristics of actuators in legged robots have a considerable influence on sim-to-real transfer. There are two challenges: 1) High reduction ratio gears are widely used in actuators, and the reality gap issue becomes especially pronounced when backdrivability is considered in controlling joints compliantly. 2) The difficulty in achieving stable bipedal locomotion causes typical system identification methods to fail to sufficiently transfer the policy. For these two challenges, we propose 1) a new simulation model of gears and 2) a method for system identification that can utilize failed attempts. The method's effectiveness is verified using a biped robot, the ROBOTIS-OP3, and the sim-to-real transferred policy can stabilize the robot under severe disturbances and walk on uneven surfaces without using force and torque sensors.
翻译:仿真到现实迁移是应对典型深度强化学习方法所需大量试错的主流手段。然而,由于现实差距的存在,将仿真训练的策略迁移到实际硬件仍是一个开放性挑战。其中,腿式机器人执行器的特性对仿真到现实迁移具有显著影响。主要存在两个挑战:1)高减速比齿轮广泛用于执行器中,在通过弹性控制关节时,逆驱动性问题会加剧现实差距;2)实现稳定双足运动的困难导致典型的系统辨识方法无法充分迁移策略。针对这两个问题,我们提出:1)一种新的齿轮仿真模型;2)一种能够利用失败尝试的系统辨识方法。通过双足机器人ROBOTIS-OP3验证了该方法有效性,经仿真到现实迁移的策略能够在无需力与扭矩传感器的情况下,使机器人在严重扰动下保持稳定,并在不平整地面行走。