We present a unified gait-conditioned reinforcement learning framework that enables humanoid robots to perform standing, walking, running, and smooth transitions within a single recurrent policy. A compact reward routing mechanism dynamically activates gait-specific objectives based on a one-hot gait ID, mitigating reward interference and supporting stable multi-gait learning. Human-inspired reward terms promote biomechanically natural motions, such as straight-knee stance and coordinated arm-leg swing, without requiring motion capture data. A structured curriculum progressively introduces gait complexity and expands command space over multiple phases. In simulation, the policy successfully achieves robust standing, walking, running, and gait transitions. On the real Unitree G1 humanoid, we validate standing, walking, and walk-to-stand transitions, demonstrating stable and coordinated locomotion. This work provides a scalable, reference-free solution toward versatile and naturalistic humanoid control across diverse modes and environments.
翻译:我们提出了一种统一的基于步态条件的强化学习框架,使仿人机器人能够在单个循环策略中实现站立、行走、奔跑和平滑步态转换。紧凑的奖励路由机制根据独热编码的步态ID动态激活特定步态目标,有效缓解奖励干扰并支持稳定的多步态学习。受人类启发的奖励项促进了生物力学自然的运动模式,如直膝站立和协调的臂腿摆动,且无需运动捕捉数据。结构化课程在多阶段中逐步引入步态复杂性并扩展指令空间。在仿真环境中,该策略成功实现了稳健的站立、行走、奔跑及步态转换。在真实的Unitree G1仿人机器人上,我们验证了站立、行走及行走到站立的转换过程,展示了稳定协调的运动能力。本研究为跨多种运动模式和环境实现通用且自然的仿人机器人控制,提供了一种可扩展的、无需参考数据的解决方案。