Humans primarily rely on walking and running to traverse complex terrains. Similarly, humanoid robots should be able to smoothly transition between walking and running while maintaining natural and stable locomotion. However, unifying gait transition and multi-terrain adaptation within a single policy remains challenging due to gradient interference between tasks and the distribution shift caused by terrain variations. Although Mixture-of-Experts (MoE) architectures can mitigate multi-skill interference, direct joint training often fails to achieve clear expert specialization. To address these challenges, we propose CoRe-MoE, a two-stage reinforcement learning framework that decouples gait generation from terrain adaptation. In the first stage, a stable locomotion policy is learned to produce natural walking and running behaviors with smooth transitions. In the second stage, a terrain-aware MoE branch is introduced, and the gating network is trained with a contrastive objective to learn structured terrain representations and promote expert specialization. The final action is obtained through weighted fusion of the base gait policy and the terrain-aware branch, enabling the policy to preserve stable locomotion while adapting to complex terrains. Extensive simulation results demonstrate that the proposed method outperforms baseline approaches in terms of success rate, locomotion stability, and multi-terrain adaptability. Furthermore, zero-shot deployment on a Unitree G1 humanoid robot validates the effectiveness of our framework, achieving robust walking and running across stairs, slopes, steps, obstacles, and unstructured outdoor terrains while maintaining accurate foothold control and dynamic stability.
翻译:人类主要依靠行走和奔跑穿越复杂地形。类似地,仿人机器人应能在行走与奔跑间平滑切换,同时保持自然稳定的运动状态。然而,由于任务间的梯度干扰以及地形变化引起的分布偏移,在单一策略中统一步态转换与多地形适应仍具挑战性。尽管混合专家(MoE)架构能够缓解多技能干扰,但直接联合训练往往难以实现清晰的专家特化。为解决这些问题,我们提出CoRe-MoE——一种将步态生成与地形适应解耦的两阶段强化学习框架。第一阶段学习稳定运动策略,生成具有平滑过渡特性的自然行走与奔跑行为;第二阶段引入地形感知MoE分支,采用对比目标训练门控网络,学习结构化地形表征并促进专家特化。通过基础步态策略与地形感知分支的加权融合获得最终动作,使策略在保持稳定运动的同时适应复杂地形。大量仿真结果表明,该方法在成功率、运动稳定性及多地形适应能力上均优于基线方法。此外,在宇树G1仿人机器人上的零样本部署验证了该框架的有效性,实现了跨楼梯、斜坡、台阶、障碍物及非结构化户外地形的稳健行走与奔跑,同时保持精确的落脚点控制与动态稳定性。