For effective deployment in real-world environments, humanoid robots must autonomously navigate a diverse range of complex terrains with abrupt transitions. While the Vanilla mixture of experts (MoE) framework is theoretically capable of modeling diverse terrain features, in practice, the gating network exhibits nearly uniform expert activations across different terrains, weakening the expert specialization and limiting the model's expressive power. To address this limitation, we introduce CMoE, a novel single-stage reinforcement learning framework that integrates contrastive learning to refine expert activation distributions. By imposing contrastive constraints, CMoE maximizes the consistency of expert activations within the same terrain while minimizing their similarity across different terrains, thereby encouraging experts to specialize in distinct terrain types. We validated our approach on the Unitree G1 humanoid robot through a series of challenging experiments. Results demonstrate that CMoE enables the robot to traverse continuous steps up to 20 cm high and gaps up to 80 cm wide, while achieving robust and natural gait across diverse mixed terrains, surpassing the limits of existing methods. To support further research and foster community development, we release our code publicly.
翻译:为实现人形机器人在现实环境中的有效部署,其必须能够自主导航具有突变过渡的多样化复杂地形。尽管经典专家混合(MoE)框架在理论上能够建模多样化的地形特征,但在实践中,门控网络在不同地形上表现出近乎一致的专家激活分布,削弱了专家的专业化程度并限制了模型的表达能力。为克服这一局限,我们提出了CMoE——一种集成对比学习以优化专家激活分布的新型单阶段强化学习框架。通过施加对比约束,CMoE最大化相同地形内专家激活的一致性,同时最小化不同地形间的相似性,从而促使专家专注于不同类型的地形特征。我们在宇树G1人形机器人上通过一系列挑战性实验验证了所提方法。结果表明,CMoE使机器人能够跨越高达20厘米的连续台阶和宽达80厘米的沟壑,并在多样化的混合地形上实现稳健自然的步态,超越了现有方法的性能极限。为支持进一步研究并促进社区发展,我们已公开代码。