We present a unified gait-conditioned reinforcement learning framework that enables humanoid robots to perform standing, walking, running, and smooth transitions within a single recurrent policy. A compact reward routing mechanism dynamically activates gait-specific objectives based on a one-hot gait ID, mitigating reward interference and supporting stable multi-gait learning. Human-inspired reward terms promote biomechanically natural motions, such as straight-knee stance and coordinated arm-leg swing, without requiring motion capture data. A structured curriculum progressively introduces gait complexity and expands command space over multiple phases. In simulation, the policy successfully achieves robust standing, walking, running, and gait transitions. On the real Unitree G1 humanoid, we validate standing, walking, and walk-to-stand transitions, demonstrating stable and coordinated locomotion. This work provides a scalable, reference-free solution toward versatile and naturalistic humanoid control across diverse modes and environments.
翻译:本文提出了一种统一的步态条件化强化学习框架,使仿人机器人能够在单一循环策略中实现站立、行走、奔跑及平滑步态转换。通过紧凑的奖励路由机制,系统依据独热编码的步态标识动态激活特定步态目标,有效缓解奖励干扰并支持稳定的多步态学习。采用仿人奖励项可促进生物力学自然运动(如直膝站立、臂腿协调摆动),且无需运动捕捉数据支持。结构化课程学习通过多阶段逐步引入步态复杂性并扩展指令空间。在仿真环境中,该策略成功实现了鲁棒的站立、行走、奔跑及步态转换。在Unitree G1实体仿人机器人上,我们验证了站立、行走及行走到站立的转换过程,展示了稳定协调的运动能力。本研究为跨模式与跨环境的通用化、自然化仿人控制提供了可扩展且无需参考数据的解决方案。