Recent advances in reinforcement learning (RL) have enabled impressive humanoid behaviors in simulation, yet transferring these results to new robots remains challenging. In many real deployments, the primary bottleneck is no longer simulation throughput or algorithm design, but the absence of systematic infrastructure that links environment verification, training, evaluation, and deployment in a coherent loop. To address this gap, we present AGILE, an end-to-end workflow for humanoid RL that standardizes the policy-development lifecycle to mitigate common sim-to-real failure modes. AGILE comprises four stages: (1) interactive environment verification, (2) reproducible training, (3) unified evaluation, and (4) descriptor-driven deployment via robot/task configuration descriptors. For evaluation stage, AGILE supports both scenario-based tests and randomized rollouts under a shared suite of motion-quality diagnostics, enabling automated regression testing and principled robustness assessment. AGILE also incorporates a set of training stabilizations and algorithmic enhancements in training stage to improve optimization stability and sim-to-real transfer. With this pipeline in place, we validate AGILE across five representative humanoid skills spanning locomotion, recovery, motion imitation, and loco-manipulation on two hardware platforms (Unitree G1 and Booster T1), achieving consistent sim-to-real transfer. Overall, AGILE shows that a standardized, end-to-end workflow can substantially improve the reliability and reproducibility of humanoid RL development.
翻译:近期强化学习(RL)的进展已使仿真中的人形机器人展现出令人瞩目的行为能力,但将这些成果迁移至新型机器人仍充满挑战。在实际部署中,主要瓶颈往往不再是仿真吞吐量或算法设计,而是缺乏将环境验证、训练、评估与部署串联为闭环的系统化基础设施。为填补这一空白,我们提出AGILE——一种面向人形机器人强化学习的端到端工作流,通过标准化策略开发生命周期来缓解常见的仿真-现实迁移失效模式。AGILE包含四个阶段:(1)交互式环境验证,(2)可复现训练,(3)统一评估,以及(4)基于机器人/任务配置描述符的驱动部署。在评估阶段,AGILE在共享运动质量诊断套件下支持场景测试和随机滚动测试,实现了自动化回归测试和原则性鲁棒性评估。AGILE还在训练阶段整合了训练稳定化措施和算法增强,以提升优化稳定性与仿真-现实迁移效果。基于该流水线,我们在两个硬件平台(Unitree G1和Booster T1)上针对行走、恢复、运动模仿及全身运动与操作等五种代表性人形技能进行了验证,实现了稳定的仿真-现实迁移。总体而言,AGILE表明标准化端到端工作流能显著提升人形机器人强化学习开发的可靠性与可复现性。