Reinforcement learning (RL) has shown great potential in enabling quadruped robots to perform agile locomotion. However, directly training policies to simultaneously handle dual extreme challenges, i.e., extreme underactuation and extreme terrains, as in monopedal hopping tasks, remains highly challenging due to unstable early-stage interactions and unreliable reward feedback. To address this, we propose JumpER (jump-start reinforcement learning via self-evolving priors), an RL training framework that structures policy learning into multiple stages of increasing complexity. By dynamically generating self-evolving priors through iterative bootstrapping of previously learned policies, JumpER progressively refines and enhances guidance, thereby stabilizing exploration and policy optimization without relying on external expert priors or handcrafted reward shaping. Specifically, when integrated with a structured three-stage curriculum that incrementally evolves action modality, observation space, and task objective, JumpER enables quadruped robots to achieve robust monopedal hopping on unpredictable terrains for the first time. Remarkably, the resulting policy effectively handles challenging scenarios that traditional methods struggle to conquer, including wide gaps up to 60 cm, irregularly spaced stairs, and stepping stones with distances varying from 15 cm to 35 cm. JumpER thus provides a principled and scalable approach for addressing locomotion tasks under the dual challenges of extreme underactuation and extreme terrains.
翻译:强化学习(RL)在实现四足机器人敏捷运动方面展现出巨大潜力。然而,在单足跳跃任务中,直接训练策略以同时应对双重极端挑战——即极端欠驱动与极端地形——由于早期交互不稳定和奖励反馈不可靠,仍然极具挑战性。为此,我们提出JumpER(基于自演化先验的跳跃启动强化学习),这是一种将策略学习结构化分为多个复杂度递增阶段的RL训练框架。通过迭代引导先前习得策略动态生成自演化先验,JumpER逐步优化和增强指导,从而在不依赖外部专家先验或人工奖励塑形的情况下稳定探索与策略优化。具体而言,当与分阶段演进行动模态、观测空间和任务目标的三阶段结构化课程相结合时,JumpER首次使四足机器人能够在不可预测地形上实现稳健的单足跳跃。值得注意的是,所得策略能有效处理传统方法难以应对的挑战性场景,包括宽度达60厘米的沟壑、不规则间距的阶梯以及间距在15厘米至35厘米间变化的踏脚石。因此,JumpER为应对极端欠驱动与极端地形双重挑战下的运动任务提供了一种原则性且可扩展的解决方案。