Learning a general humanoid whole-body controller is challenging because practical reference motions can exhibit noise and inconsistencies after being transferred to the robot domain, and local defects may be amplified by closed-loop execution, causing drift or failure in highly dynamic and contact-rich behaviors. We propose a dynamics-conditioned command aggregation framework that uses a causal temporal encoder to summarize recent proprioception and a multi-head cross-attention command encoder to selectively aggregate a context window based on the current dynamics. We further integrate a fall recovery curriculum with random unstable initialization and an annealed upward assistance force to improve robustness and disturbance rejection. The resulting policy requires only about 3.5 hours of motion data and supports single-stage end-to-end training without distillation. The proposed method is evaluated under diverse reference inputs and challenging motion regimes, demonstrating zero-shot transfer to unseen motions as well as robust sim-to-real transfer on a physical humanoid robot.
翻译:学习通用的人形机器人全身控制器具有挑战性,因为实际的参考运动在迁移至机器人领域后可能表现出噪声与不一致性,且局部缺陷可能因闭环执行而被放大,导致高动态、多接触行为中的漂移或失败。我们提出了一种基于动力学条件的指令聚合框架,该框架使用因果时序编码器来汇总近期的本体感知信息,并利用多头交叉注意力指令编码器,根据当前动力学状态选择性聚合上下文窗口。我们进一步整合了包含随机不稳定初始化和退火式向上辅助力的摔倒恢复课程,以提升鲁棒性与抗干扰能力。所得策略仅需约3.5小时的运动数据,并支持无需蒸馏的单阶段端到端训练。所提方法在多样化参考输入与具有挑战性的运动模式下进行评估,结果表明其能实现未见运动的零样本迁移,并在实体人形机器人上完成鲁棒的仿真到现实迁移。