We present a unified gait-conditioned reinforcement learning framework that enables humanoid robots to perform standing, walking, running, and smooth transitions within a single recurrent policy. A compact reward routing mechanism dynamically activates gait-specific objectives based on a one-hot gait ID, mitigating reward interference and supporting stable multi-gait learning. Human-inspired reward terms promote biomechanically natural motions, such as straight-knee stance and coordinated arm-leg swing, without requiring motion capture data. A structured curriculum progressively introduces gait complexity and expands command space over multiple phases. In simulation, the policy successfully achieves robust standing, walking, running, and gait transitions. On the real Unitree G1 humanoid, we validate standing, walking, and walk-to-stand transitions, demonstrating stable and coordinated locomotion. This work provides a scalable, reference-free solution toward versatile and naturalistic humanoid control across diverse modes and environments.
翻译:我们提出了一种统一的步态条件强化学习框架,使人形机器人能够在单一循环策略中实现站立、行走、奔跑及平滑过渡。紧凑的奖励路由机制基于独热步态ID动态激活步态特定目标,有效缓解奖励干扰并支持稳定的多步态学习。受人类启发的奖励项促进了生物力学上自然的运动,如直膝站立和协调的臂腿摆动,且无需运动捕捉数据。结构化课程规划在多阶段中逐步引入步态复杂性并扩展指令空间。在仿真中,该策略成功实现了稳健的站立、行走、奔跑及步态转换。在真实的Unitree G1人形机器人上,我们验证了站立、行走及行走到站立的过渡,展示了稳定协调的运动能力。本研究为跨多种模式与环境的通用、自然的人形机器人控制提供了一种可扩展、无参考的解决方案。