Learning Generic and Dynamic Locomotion of Humanoids Across Discrete Terrains

This paper addresses the challenge of terrain-adaptive dynamic locomotion in humanoid robots, a problem traditionally tackled by optimization-based methods or reinforcement learning (RL). Optimization-based methods, such as model-predictive control, excel in finding optimal reaction forces and achieving agile locomotion, especially in quadruped, but struggle with the nonlinear hybrid dynamics of legged systems and the real-time computation of step location, timing, and reaction forces. Conversely, RL-based methods show promise in navigating dynamic and rough terrains but are limited by their extensive data requirements. We introduce a novel locomotion architecture that integrates a neural network policy, trained through RL in simplified environments, with a state-of-the-art motion controller combining model-predictive control (MPC) and whole-body impulse control (WBIC). The policy efficiently learns high-level locomotion strategies, such as gait selection and step positioning, without the need for full dynamics simulations. This control architecture enables humanoid robots to dynamically navigate discrete terrains, making strategic locomotion decisions (e.g., walking, jumping, and leaping) based on ground height maps. Our results demonstrate that this integrated control architecture achieves dynamic locomotion with significantly fewer training samples than conventional RL-based methods and can be transferred to different humanoid platforms without additional training. The control architecture has been extensively tested in dynamic simulations, accomplishing terrain height-based dynamic locomotion for three different robots.

翻译：本文针对人形机器人地形自适应动态运动这一挑战展开研究，该问题传统上通过基于优化的方法或强化学习（RL）来解决。基于优化的方法，如模型预测控制（MPC），在寻找最优反作用力和实现敏捷运动方面表现出色，尤其在四足机器人中，但难以处理腿式系统的非线性混合动力学特性以及步态位置、时机和反作用力的实时计算。相反，基于RL的方法在动态和崎岖地形导航方面展现出潜力，但受限于其庞大的数据需求。我们提出了一种新颖的运动控制架构，该架构整合了一个通过RL在简化环境中训练的神经网络策略，以及一个结合了模型预测控制（MPC）和全身冲量控制（WBIC）的先进运动控制器。该策略能够高效学习高级运动策略，如步态选择和步位规划，而无需进行完整的动力学仿真。该控制架构使人形机器人能够基于地面高度图，动态地在离散地形上导航，并做出战略性运动决策（例如行走、跳跃和跨越）。我们的结果表明，这种集成控制架构以比传统基于RL的方法少得多的训练样本实现了动态运动，并且无需额外训练即可迁移到不同的人形机器人平台。该控制架构已在动态仿真中进行了广泛测试，成功为三种不同的机器人实现了基于地形高度的动态运动。