Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks.
翻译:涉及序列接触的仿人机器人活动对于现实世界中复杂的机器人交互与操作至关重要,传统上通过基于模型的运动规划来解决,这种方法耗时且通常依赖于简化的动力学模型。尽管无模型强化学习已成为实现通用且鲁棒的全身仿人机器人控制的有力工具,但它仍需要繁琐的任务特定调参和状态机设计,并且在涉及接触序列的任务中面临长时程探索问题。在本工作中,我们提出WoCoCo(基于序列接触的全身控制),这是一个通过将任务自然分解为不同接触阶段来学习具有序列接触的全身仿人机器人控制的统一框架。这种分解通过任务无关的奖励与仿真到现实设计,促进了简单通用的策略学习流程,每个任务仅需指定一至两项任务相关项。我们证明,使用WoCoCo训练的端到端基于强化学习的控制器,能够在没有任何运动先验的情况下,在现实世界中实现四项涉及多样化接触序列的挑战性全身仿人机器人任务:1)通用跑酷跳跃,2)箱子移动操控,3)动态击掌踢踏舞蹈,以及4)悬崖攀爬。我们进一步通过将其应用于22自由度的恐龙机器人移动操控任务,表明WoCoCo是一个超越仿人机器人领域的通用框架。