Learning Agile Locomotion and Adaptive Behaviors via RL-augmented MPC

In the context of legged robots, adaptive behavior involves adaptive balancing and adaptive swing foot reflection. While adaptive balancing counteracts perturbations to the robot, adaptive swing foot reflection helps the robot to navigate intricate terrains without foot entrapment. In this paper, we manage to bring both aspects of adaptive behavior to quadruped locomotion by combining RL and MPC while improving the robustness and agility of blind legged locomotion. This integration leverages MPC's strength in predictive capabilities and RL's adeptness in drawing from past experiences. Unlike traditional locomotion controls that separate stance foot control and swing foot trajectory, our innovative approach unifies them, addressing their lack of synchronization. At the heart of our contribution is the synthesis of stance foot control with swing foot reflection, improving agility and robustness in locomotion with adaptive behavior. A hallmark of our approach is robust blind stair climbing through swing foot reflection. Moreover, we intentionally designed the learning module as a general plugin for different robot platforms. We trained the policy and implemented our approach on the Unitree A1 robot, achieving impressive results: a peak turn rate of 8.5 rad/s, a peak running speed of 3 m/s, and steering at a speed of 2.5 m/s. Remarkably, this framework also allows the robot to maintain stable locomotion while bearing an unexpected load of 10 kg, or 83\% of its body mass. We further demonstrate the generalizability and robustness of the same policy where it realizes zero-shot transfer to different robot platforms like Go1 and AlienGo robots for load carrying. Code is made available for the use of the research community at https://github.com/DRCL-USC/RL_augmented_MPC.git

翻译：在腿式机器人领域，自适应行为包括自适应平衡与自适应摆腿反射。自适应平衡用于抵消机器人受到的扰动，而自适应摆腿反射则帮助机器人在复杂地形中避免腿部卡滞。本文通过结合强化学习（RL）与模型预测控制（MPC），在提升盲态腿式运动鲁棒性与敏捷性的同时，成功将自适应行为的两个层面融入四足运动。该融合策略利用了MPC在预测能力方面的优势以及RL从过往经验中学习的适应性。与将支撑足控制与摆腿轨迹分离的传统运动控制方法不同，我们提出的创新性统一框架解决了二者缺乏同步性的问题。我们的核心贡献在于实现支撑足控制与摆腿反射的协同合成，通过自适应行为提升运动的敏捷性与鲁棒性。该方法的标志性成果是通过摆腿反射实现稳健的盲态爬楼梯。此外，我们有意将学习模块设计为适用于不同机器人平台的通用插件。我们在Unitree A1机器人上完成了策略训练与算法实现，取得了显著成果：峰值转向角速度达8.5 rad/s，峰值奔跑速度达3 m/s，2.5 m/s速度下的转向能力。尤为重要的是，该框架使机器人在承受10 kg（占其自重的83%）意外负载时仍能维持稳定运动。我们进一步展示了同一策略的泛化性与鲁棒性，实现了向Go1、AlienGo等不同机器人平台零样本迁移的负重任务。研究代码已开源发布于https://github.com/DRCL-USC/RL_augmented_MPC.git，供科研社区使用。