Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.

翻译：本文全面研究了利用深度强化学习（RL）为双足机器人构建动态运动控制器的方法。不同于仅关注单一运动技能，我们开发了一种通用控制方案，可用于多种动态双足技能，包括周期性行走与奔跑，以及非周期性跳跃与站立。我们的基于RL的控制器采用了新颖的双历史架构，同时利用机器人输入/输出（I/O）的历史长时与短时信息。通过所提出的端到端RL方法训练后，该控制架构在仿真与现实环境中的广泛技能集上均持续优于其他方法。研究还深入探讨了所提出的RL系统在开发运动控制器时引入的适应性与稳健性。我们证明，该架构通过有效利用机器人的I/O历史，既能适应时不变动力学偏移，也能适应时变变化（如接触事件）。此外，我们识别出任务随机化是另一个关键的稳健性来源，有助于增强任务泛化能力及对干扰的顺从性。所得控制策略已成功部署于Cassie——一种扭矩控制型类人尺寸双足机器人。本研究通过大量现实实验，进一步拓展了双足机器人的敏捷性极限。我们展示了多样化的运动技能，包括：稳健站立、灵活行走、快速奔跑（以400米短跑演示），以及各类跳跃技能，例如立定跳远和跳高。