Quadruped robots have shown remarkable mobility on various terrains through reinforcement learning. Yet, in the presence of sparse footholds and risky terrains such as stepping stones and balance beams, which require precise foot placement to avoid falls, model-based approaches are often used. In this paper, we show that end-to-end reinforcement learning can also enable the robot to traverse risky terrains with dynamic motions. To this end, our approach involves training a generalist policy for agile locomotion on disorderly and sparse stepping stones before transferring its reusable knowledge to various more challenging terrains by finetuning specialist policies from it. Given that the robot needs to rapidly adapt its velocity on these terrains, we formulate the task as a navigation task instead of the commonly used velocity tracking which constrains the robot's behavior and propose an exploration strategy to overcome sparse rewards and achieve high robustness. We validate our proposed method through simulation and real-world experiments on an ANYmal-D robot achieving peak forward velocity of >= 2.5 m/s on sparse stepping stones and narrow balance beams. Video: youtu.be/Z5X0J8OH6z4
翻译:四足机器人通过强化学习已在多种地形上展现出出色的移动能力。然而,在存在稀疏立足点和风险地形(如踏脚石和平衡木)的情况下,这些地形需要精确的足部放置以避免跌落,通常采用基于模型的方法。本文表明,端到端强化学习也能使机器人通过动态运动穿越风险地形。为此,我们的方法包括先训练一个通用策略,用于在杂乱且稀疏的踏脚石地形上实现敏捷运动,然后通过微调专家策略,将其可重用知识迁移到各种更具挑战性的地形上。鉴于机器人需要在这些地形上快速调整速度,我们将任务定义为导航任务,而非通常限制机器人行为的常用速度跟踪任务,并提出一种探索策略以克服稀疏奖励并实现高鲁棒性。我们通过仿真和真实环境实验验证了所提出的方法,在稀疏踏脚石和狭窄平衡木上,ANYmal-D机器人实现了峰值前进速度>=2.5米/秒。视频:youtu.be/Z5X0J8OH6z4