We present a framework for learning visually-guided quadruped locomotion by integrating exteroceptive sensing and central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework. Through both exteroceptive and proprioceptive sensing, the agent learns to coordinate rhythmic behavior among different oscillators to track velocity commands, while at the same time override these commands to avoid collisions with the environment. We investigate several open robotics and neuroscience questions: 1) What is the role of explicit interoscillator couplings between oscillators, and can such coupling improve sim-to-real transfer for navigation robustness? 2) What are the effects of using a memory-enabled vs. a memory-free policy network with respect to robustness, energy-efficiency, and tracking performance in sim-to-real navigation tasks? 3) How do animals manage to tolerate high sensorimotor delays, yet still produce smooth and robust gaits? To answer these questions, we train our perceptive locomotion policies in simulation and perform sim-to-real transfers to the Unitree Go1 quadruped, where we observe robust navigation in a variety of scenarios. Our results show that the CPG, explicit interoscillator couplings, and memory-enabled policy representations are all beneficial for energy efficiency, robustness to noise and sensory delays of 90 ms, and tracking performance for successful sim-to-real transfer for navigation tasks. Video results can be found at https://youtu.be/wpsbSMzIwgM.
翻译:我们提出了一种框架,通过将外感受感知与中枢模式生成器(CPG,即耦合振荡器系统)整合到深度强化学习(DRL)框架中,来学习视觉引导的四足运动。通过外感受和本体感受感知,智能体能够协调不同振荡器之间的节律行为以跟踪速度指令,同时可覆盖这些指令以避免与环境发生碰撞。我们探究了机器人学与神经科学中的若干开放性问题:1)振荡器之间的显式耦合作用是什么?这种耦合能否提升导航鲁棒性的仿真到现实迁移?2)在仿真到现实的导航任务中,使用具备记忆能力的策略网络与无记忆策略网络对鲁棒性、能量效率和跟踪性能有何影响?3)动物如何能够容忍高传感运动延迟,却仍能产生平滑且鲁棒的步态?为解答这些问题,我们在仿真环境中训练具备感知能力的运动策略,并执行到Unitree Go1四足机器人的仿真到现实迁移,观察到其在多种场景下的鲁棒导航能力。结果表明:CPG、显式振荡器耦合以及具备记忆的策略表征对能量效率、对噪声及90毫秒传感延迟的鲁棒性、以及导航任务成功迁移的跟踪性能均具有积极贡献。视频结果见https://youtu.be/wpsbSMzIwgM。