Humanoid perceptive locomotion has made significant progress and shows great promise, yet achieving robust multi-directional locomotion on complex terrains remains underexplored. To tackle this challenge, we propose RPL, a two-stage training framework that enables multi-directional locomotion on challenging terrains, and remains robust with payloads. RPL first trains terrain-specific expert policies with privileged height map observations to master decoupled locomotion and manipulation skills across different terrains, and then distills them into a transformer policy that leverages multiple depth cameras to cover a wide range of views. During distillation, we introduce two techniques to robustify multi-directional locomotion, depth feature scaling based on velocity commands and random side masking, which are critical for asymmetric depth observations and unseen widths of terrains. For scalable depth distillation, we develop an efficient multi-depth system that ray-casts against both dynamic robot meshes and static terrain meshes in massively parallel environments, achieving a 5-times speedup over the depth rendering pipelines in existing simulators while modeling realistic sensor latency, noise, and dropout. Extensive real-world experiments demonstrate robust multi-directional locomotion with payloads (2kg) across challenging terrains, including 20° slopes, staircases with different step lengths (22 cm, 25 cm, 30 cm), and 25 cm by 25 cm stepping stones separated by 60 cm gaps.
翻译:人形感知运动已取得显著进展并展现出巨大潜力,然而在复杂地形上实现鲁棒的多向运动仍待深入探索。为应对这一挑战,我们提出RPL——一个两阶段训练框架,该框架能够在挑战性地形上实现多向运动,并在负载条件下保持鲁棒性。RPL首先利用特权高度图观测训练地形专用专家策略,以掌握不同地形下的解耦运动与操作技能,随后将其蒸馏至一个基于Transformer的策略中,该策略利用多个深度摄像头覆盖广视角范围。在蒸馏过程中,我们引入两项技术以增强多向运动的鲁棒性:基于速度指令的深度特征缩放与随机侧面掩蔽,这对处理非对称深度观测及未知地形宽度至关重要。为实现可扩展的深度蒸馏,我们开发了高效多深度系统,在大规模并行环境中同时对动态机器人网格与静态地形网格进行光线投射,在模拟真实传感器延迟、噪声和信号丢失的同时,相比现有模拟器的深度渲染管线实现了5倍加速。大量真实世界实验证明,该系统能在负载条件下(2公斤)在包括20°斜坡、不同阶高楼梯(22厘米、25厘米、30厘米)及间距60厘米的25厘米×25厘米踏石等挑战性地形上实现鲁棒的多向运动。