Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning

Deep reinforcement learning has emerged as a popular and powerful way to develop locomotion controllers for quadruped robots. Common approaches have largely focused on learning actions directly in joint space, or learning to modify and offset foot positions produced by trajectory generators. Both approaches typically require careful reward shaping and training for millions of time steps, and with trajectory generators introduce human bias into the resulting control policies. In this paper, we present a learning framework that leads to the natural emergence of fast and robust bounding policies for quadruped robots. The agent both selects and controls actions directly in task space to track desired velocity commands subject to environmental noise including model uncertainty and rough terrain. We observe that this framework improves sample efficiency, necessitates little reward shaping, leads to the emergence of natural gaits such as galloping and bounding, and eases the sim-to-real transfer at running speeds. Policies can be learned in only a few million time steps, even for challenging tasks of running over rough terrain with loads of over 100% of the nominal quadruped mass. Training occurs in PyBullet, and we perform a sim-to-sim transfer to Gazebo and sim-to-real transfer to the Unitree A1 hardware. For sim-to-sim, our results show the quadruped is able to run at over 4 m/s without a load, and 3.5 m/s with a 10 kg load, which is over 83% of the nominal quadruped mass. For sim-to-real, the Unitree A1 is able to bound at 2 m/s with a 5 kg load, representing 42% of the nominal quadruped mass.

翻译：深度强化学习已成为开发四足机器人运动控制器的流行且强大的方法。常见方法主要集中于在关节空间直接学习动作，或学习修改和补偿轨迹生成器产生的足部位置。这两种方法通常需要精细的奖励塑造以及数百万时间步长的训练，且轨迹生成器会向最终控制策略引入人为偏差。本文提出了一种学习框架，该框架能自然催生四足机器人快速且鲁棒的跳跃策略。智能体直接在任务空间选择和控制器动作以跟踪期望速度指令，同时应对包括模型不确定性和粗糙地形在内的环境噪声。我们观察到，该框架提高了样本效率，几乎无需奖励塑造，能自然涌现诸如疾驰和跳跃等步态，并简化了高速奔跑时的模拟到现实迁移。即使在超过四足机器人标称质量100%负载的崎岖地形上奔跑等挑战性任务中，策略也仅需数百万时间步即可习得。训练在PyBullet中进行，我们完成了向Gazebo的模拟间迁移以及向Unitree A1硬件的模拟到现实迁移。在模拟间迁移中，结果表明四足机器人无负载时能以超过4米/秒的速度奔跑，负载10公斤（超过四足机器人标称质量的83%）时可达3.5米/秒。在模拟到现实迁移中，Unitree A1在负载5公斤（占四足机器人标称质量的42%）情况下能以2米/秒的速度跳跃。