We propose a unified reinforcement learning framework that enables a single policy to perform walking, running, and fall recovery on the Unitree G1 humanoid robot, validated on physical hardware without any explicit mode-switching command at deployment. The framework extends Adversarial Motion Priors (AMP) by replacing the conventional global reference distribution with a state-dependent gate that routes each training transition to one of two discriminators: a dedicated recovery discriminator and a velocity-conditioned locomotion discriminator that jointly covers walking and running. The gate is defined by a single fixed threshold on projected gravity: the recovery discriminator is activated when body tilt exceeds approximately $37^\circ$ from vertical ($|g_z+1|>0.6$); otherwise the locomotion discriminator is used, with the normalized commanded velocity serving as a condition that selects the appropriate reference trajectory between walk and run clips. Only three LAFAN1 reference clips are required to regularize the complete behavior set. At deployment, a single frozen ONNX policy executes at 50\,Hz with no runtime mode logic; hardware experiments demonstrate successful recovery from both prone and supine falls and smooth walk-to-run transitions under the same controller.
翻译:我们提出一种统一强化学习框架,使单个策略能够完成Unitree G1人形机器人的行走、奔跑及摔倒恢复,并在物理硬件上完成验证,部署时无需任何显式模式切换指令。该框架扩展了对抗运动先验(AMP),通过用状态依赖门控替代传统全局参考分布,将每次训练转移路由至两个判别器之一:专用的恢复判别器和速度条件化运动判别器(该判别器联合覆盖行走与奔跑)。门控由投影重力的单一固定阈值定义:当身体倾斜超过垂直方向约37°时(|g_z+1|>0.6)激活恢复判别器;否则使用运动判别器,并以归一化指令速度作为条件在行走与奔跑参考轨迹间选择。仅需三个LAFAN1参考片段即可正则化完整行为集。部署时,单个冻结ONNX策略以50Hz频率执行,无需运行时模式逻辑;硬件实验证明该控制器可在无模式切换条件下实现俯卧与仰卧摔倒恢复及平滑的行走-奔跑过渡。