Humanoid Whole-Body Controllers trained with reinforcement learning (RL) have recently achieved remarkable performance, yet many target a single robot embodiment. Variations in dynamics, degrees of freedom (DoFs), and kinematic topology still hinder a single policy from commanding diverse humanoids. Moreover, obtaining a generalist policy that not only transfers across embodiments but also supports richer behaviors-beyond simple walking to squatting, leaning-remains especially challenging. In this work, we tackle these obstacles by introducing EAGLE, an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple heterogeneous humanoids without per-robot reward tuning. During each cycle, embodiment-specific specialists are forked from the current generalist, refined on their respective robots, and new skills are distilled back into the generalist by training on the pooled embodiment set. Repeating this loop until performance convergence produces a robust Whole-Body Controller validated on robots such as Unitree H1, G1, and Fourier N1. We conducted experiments on five different robots in simulation and four in real-world settings. Through quantitative evaluations, EAGLE achieves high tracking accuracy and robustness compared to other methods, marking a step toward scalable, fleet-level humanoid control. See more details at https://eagle-wbc.github.io/
翻译:基于强化学习训练的人形机器人全身控制器近期取得了显著性能,但多数仅针对单一机器人本体。动力学特性、自由度以及运动学拓扑结构的差异仍阻碍单一策略对多样化人形机器人的统一操控。此外,获得不仅能够跨本体迁移、还能支持更丰富行为(从简单行走到蹲伏、侧倾等)的通用策略尤其具有挑战性。本研究通过提出EAGLE框架应对这些挑战——这是一个迭代式通用-专家蒸馏框架,能够生成无需针对各机器人进行奖励调整即可控制多个异构人形机器人的统一策略。在每个循环周期中,系统从当前通用策略派生出针对特定本体的专家策略,在各自机器人上进行精调,随后通过在全本体集合上的训练将新技能蒸馏回通用策略。该循环持续至性能收敛,最终产生经过Unitree H1、G1和Fourier N1等机器人验证的鲁棒全身控制器。我们在仿真环境中对五种不同机器人、在现实场景中对四种机器人进行了实验。定量评估表明,相较于其他方法,EAGLE实现了更高的轨迹跟踪精度与鲁棒性,标志着向可扩展的集群级人形机器人控制迈出了重要一步。更多细节请访问:https://eagle-wbc.github.io/