Humanoid locomotion has advanced rapidly with deep reinforcement learning (DRL), enabling robust feet-based traversal over uneven terrain. Yet platforms beyond leg length remain largely out of reach because current RL training paradigms often converge to jumping-like solutions that are high-impact, torque-limited, and unsafe for real-world deployment. To address this gap, we propose APEX, a system for perceptive, climbing-based high-platform traversal that composes terrain-conditioned behaviors: climb-up and climb-down at vertical edges, walking or crawling on the platform, and stand-up and lie-down for posture reconfiguration. Central to our approach is a generalized ratchet progress reward for learning contact-rich, goal-reaching maneuvers. It tracks the best-so-far task progress and penalizes non-improving steps, providing dense yet velocity-free supervision that enables efficient exploration under strong safety regularization. Based on this formulation, we train LiDAR-based full-body maneuver policies and reduce the sim-to-real perception gap through a dual strategy: modeling mapping artifacts during training and applying filtering and inpainting to elevation maps during deployment. Finally, we distill all six skills into a single policy that autonomously selects behaviors and transitions based on local geometry and commands. Experiments on a 29-DoF Unitree G1 humanoid demonstrate zero-shot sim-to-real traversal of 0.8 meter platforms (approximately 114% of leg length), with robust adaptation to platform height and initial pose, as well as smooth and stable multi-skill transitions.
翻译:仿人机器人运动控制借助深度强化学习(DRL)已取得快速进展,实现了基于足部的不平整地形稳健穿越。然而,对于超过腿部长度的平台,现有方法仍难以应对,因为当前强化学习训练范式常收敛于类跳跃式解决方案,这些方案存在高冲击、扭矩受限等问题,在实际部署中安全性不足。为弥补这一缺陷,我们提出APEX系统,该系统通过感知实现基于攀爬的高平台穿越,其融合了地形条件化行为:在垂直边缘执行攀爬上升与下降、在平台面进行行走或爬行、以及通过站立与躺卧实现姿态重构。本方法的核心在于设计了一种广义棘轮进度奖励机制,用于学习接触密集的目标抵达动作。该机制追踪当前最优任务进度并惩罚未提升进度的步骤,提供密集且无需速度信息的监督信号,从而在强安全正则化条件下实现高效探索。基于此框架,我们训练了基于激光雷达的全身动作策略,并通过双重策略降低仿真到现实的感知差异:在训练过程中建模建图伪影,在部署时对高程地图进行滤波与修复处理。最终,我们将全部六项技能提炼至单一策略中,该策略能依据局部几何特征与指令自主选择行为并实现过渡。在29自由度的宇树G1仿人机器人上的实验表明,系统实现了0.8米平台(约腿长的114%)的零样本仿真到现实穿越,对平台高度与初始姿态具有强适应性,并能完成平滑稳定的多技能过渡。