APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

Humanoid locomotion has advanced rapidly with deep reinforcement learning (DRL), enabling robust feet-based traversal over uneven terrain. Yet platforms beyond leg length remain largely out of reach because current RL training paradigms often converge to jumping-like solutions that are high-impact, torque-limited, and unsafe for real-world deployment. To address this gap, we propose APEX, a system for perceptive, climbing-based high-platform traversal that composes terrain-conditioned behaviors: climb-up and climb-down at vertical edges, walking or crawling on the platform, and stand-up and lie-down for posture reconfiguration. Central to our approach is a generalized ratchet progress reward for learning contact-rich, goal-reaching maneuvers. It tracks the best-so-far task progress and penalizes non-improving steps, providing dense yet velocity-free supervision that enables efficient exploration under strong safety regularization. Based on this formulation, we train LiDAR-based full-body maneuver policies and reduce the sim-to-real perception gap through a dual strategy: modeling mapping artifacts during training and applying filtering and inpainting to elevation maps during deployment. Finally, we distill all six skills into a single policy that autonomously selects behaviors and transitions based on local geometry and commands. Experiments on a 29-DoF Unitree G1 humanoid demonstrate zero-shot sim-to-real traversal of 0.8 meter platforms (approximately 114% of leg length), with robust adaptation to platform height and initial pose, as well as smooth and stable multi-skill transitions.

翻译：仿人机器人运动控制借助深度强化学习（DRL）已取得快速进展，实现了基于足部的不平整地形稳健穿越。然而，对于超过腿部长度的平台，现有方法仍难以应对，因为当前强化学习训练范式常收敛于类跳跃式解决方案，这些方案存在高冲击、扭矩受限等问题，在实际部署中安全性不足。为弥补这一缺陷，我们提出APEX系统，该系统通过感知实现基于攀爬的高平台穿越，其融合了地形条件化行为：在垂直边缘执行攀爬上升与下降、在平台面进行行走或爬行、以及通过站立与躺卧实现姿态重构。本方法的核心在于设计了一种广义棘轮进度奖励机制，用于学习接触密集的目标抵达动作。该机制追踪当前最优任务进度并惩罚未提升进度的步骤，提供密集且无需速度信息的监督信号，从而在强安全正则化条件下实现高效探索。基于此框架，我们训练了基于激光雷达的全身动作策略，并通过双重策略降低仿真到现实的感知差异：在训练过程中建模建图伪影，在部署时对高程地图进行滤波与修复处理。最终，我们将全部六项技能提炼至单一策略中，该策略能依据局部几何特征与指令自主选择行为并实现过渡。在29自由度的宇树G1仿人机器人上的实验表明，系统实现了0.8米平台（约腿长的114%）的零样本仿真到现实穿越，对平台高度与初始姿态具有强适应性，并能完成平滑稳定的多技能过渡。