Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.
翻译:强化学习已在四足敏捷运动方面展现出强大潜力,即便仅依赖本体感知。然而在实践中,复杂地形下的仿真-现实差距和奖励过拟合会导致策略迁移失败,而物理验证仍存在风险且效率低下。为解决这些挑战,我们提出一个统一框架,包含用于鲁棒多地形表征的混合专家(MoE)运动策略,以及量化仿真-现实迁移性的预测评估套件RoboGauge。MoE策略采用门控式专家集对潜在地形和指令建模进行解耦,仅通过本体感知即可实现卓越的部署鲁棒性与泛化能力。RoboGauge进一步通过跨地形、难度级别和域随机化的仿真-仿真测试提供多维度本体感知指标,无需大量物理实验即可实现可靠的MoE策略选择。在Unitree Go2机器人上的实验表明,该策略能在雪地、沙地、楼梯、斜坡和30厘米障碍物等未知挑战性地形实现鲁棒运动。专用高速测试中,机器人达到4米/秒速度,并展现出与高速稳定性提升相关的涌现式窄步态。