Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.
翻译:强化学习已展现出实现四足敏捷运动的巨大潜力,即使在仅使用本体感知的情况下。然而在实践中,仿真到现实的差异以及复杂地形中的奖励过拟合可能导致策略无法成功迁移,而物理验证仍然存在风险且效率低下。为解决这些挑战,我们提出了一个统一框架,包含用于鲁棒多地形表征的混合专家(MoE)运动策略,以及RoboGauge——一个量化仿真到现实可迁移性的预测评估套件。MoE策略采用门控机制组合专业专家集合,通过分解潜在地形与指令建模,仅依靠本体感知即实现卓越的部署鲁棒性和泛化能力。RoboGauge进一步通过跨地形、难度等级和领域随机化的仿真到仿真测试,提供多维度基于本体感知的度量指标,从而无需大量物理试验即可实现可靠的MoE策略选择。在Unitree Go2机器人上的实验表明,该系统能在未见过的挑战性地形(包括雪地、沙地、楼梯、斜坡和30厘米障碍物)上实现鲁棒运动。在专项高速测试中,机器人速度达到4米/秒,并表现出与高速稳定性提升相关的涌现性窄步态。