Humanoid parkour requires locomotion policies to coordinate whole-body dynamics across rapidly changing terrains such as stairs, gaps, slopes, and obstacles. Existing reinforcement learning policies are largely reactive, mapping observations directly to actions without explicitly modeling future body states. Such modeling becomes critical in agile locomotion tasks where successful motion execution depends strongly on anticipating upcoming contact transitions and body dynamics.We present ParkourFormer, a Transformer-based sequence modeling framework that reformulates humanoid locomotion as a future-conditioned decision-making problem. The current robot state queries historical sensorimotor trajectories through cross-attention, while a lightweight prediction head forecasts short-horizon future proprioceptive states. The predicted future states, trained with supervised signals, are fused with temporal features to generate actions, enabling the policy to jointly reason over motion history and anticipated future dynamics. We evaluate ParkourFormer on a diverse multi-terrain humanoid parkour benchmark including stairs, gaps, slopes, rough terrain, and obstacle traversal. Experiments in simulation and on a real humanoid robot show that ParkourFormer achieves a 93.85% average traversal success rate on highly challenging terrains, with improvements of up to 42.73% over strong MLP, MoE-based MLP, and vanilla Transformer baselines, while maintaining a single unified policy across all terrain types. These results demonstrate that explicit future-state modeling significantly improves robustness and generalization for agile whole-body locomotion.
翻译:人形机器人的跑酷运动需要运动策略在台阶、间隙、斜坡和障碍物等快速变化的地形上协调全身动力学。现有的强化学习策略主要基于反应式机制,直接将观测映射为动作,而未显式建模未来身体状态。在敏捷运动任务中,此类建模至关重要,因为成功执行动作高度依赖于对即将到来的接触转换和身体动力学的预测。本文提出ParkourFormer——一种基于Transformer的序列建模框架,将人形机器人运动重新表述为基于未来状态的决策问题。当前机器人状态通过交叉注意力机制查询历史感觉运动轨迹,同时轻量级预测头通过监督信号训练,预测短时域内的未来本体感觉状态。这些预测的未来状态与时间特征融合以生成动作,使策略能够联合推理运动历史与预期未来动力学。我们在包含台阶、间隙、斜坡、崎岖地形和障碍穿越的多地形人形机器人跑酷基准测试中评估ParkourFormer。仿真和真实人形机器人实验表明,ParkourFormer在极具挑战性的地形上实现了93.85%的平均穿越成功率,相比强MLP、基于MoE的MLP和标准Transformer基线,性能提升高达42.73%,且在所有地形类型中保持单一统一策略。这些结果表明,显式未来状态建模显著提升了敏捷全身运动的鲁棒性和泛化能力。