Running up stairs is effortless for humans but remains extremely challenging for humanoid robots due to the simultaneous requirements of high agility and strict stability. Model-free reinforcement learning (RL) can generate dynamic locomotion, yet implicit stability rewards and heavy reliance on task-specific reward shaping tend to result in unsafe behaviors, especially on stairs; conversely, model-based foothold planners encode contact feasibility and stability structure, but enforcing their hard constraints often induces conservative motion that limits speed. We present FastStair, a planner-guided, multi-stage learning framework that reconciles these complementary strengths to achieve fast and stable stair ascent. FastStair integrates a parallel model-based foothold planner into the RL training loop to bias exploration toward dynamically feasible contacts and to pretrain a safety-focused base policy. To mitigate planner-induced conservatism and the discrepancy between low- and high-speed action distributions, the base policy was fine-tuned into speed-specialized experts and then integrated via Low-Rank Adaptation (LoRA) to enable smooth operation across the full commanded-speed range. We deploy the resulting controller on the Oli humanoid robot, achieving stable stair ascent at commanded speeds up to 1.65 m/s and traversing a 33-step spiral staircase (17 cm rise per step) in 12 s, demonstrating robust high-speed performance on long staircases. Notably, the proposed approach served as the champion solution in the Canton Tower Robot Run Up Competition.
翻译:跑楼梯对人类而言轻而易举,但对人形机器人来说却极具挑战,因为它同时要求高度的敏捷性和严格的稳定性。无模型强化学习能够生成动态运动,但隐式的稳定性奖励和对任务特定奖励设计的严重依赖往往导致不安全行为,尤其在楼梯上;相反,基于模型的落脚点规划器编码了接触可行性和稳定性结构,但强制执行其硬约束通常会引发限制速度的保守运动。我们提出了FastStair,一个规划器引导的多阶段学习框架,它融合了这些互补的优势,以实现快速稳定的上楼梯运动。FastStair将一个并行的基于模型的落脚点规划器集成到强化学习训练循环中,以引导探索朝向动态可行的接触,并预训练一个注重安全的基础策略。为了减轻规划器导致的保守性以及低速与高速动作分布之间的差异,基础策略被微调为速度特化的专家策略,然后通过低秩自适应进行集成,以实现整个指令速度范围内的平滑操作。我们将最终控制器部署在Oli人形机器人上,实现了指令速度高达1.65 m/s的稳定上楼梯,并在12秒内穿越了一个33级的螺旋楼梯(每级台阶高17厘米),展示了在长楼梯上鲁棒的高速性能。值得注意的是,所提出的方法是广州塔机器人跑楼梯竞赛的冠军解决方案。