Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans. Code and videos: https://nicklashansen.com/rlpuppeteer
翻译:全身控制对于人形机器人而言具有挑战性,这源于问题本身的高维特性以及双足形态固有的不稳定性。从视觉观测中学习进一步加剧了这一难度。在本工作中,我们探索了基于强化学习的高度数据驱动的视觉全身人形机器人控制方法,无需任何简化假设、奖励设计或技能基元。具体而言,我们提出了一种分层世界模型,其中高层智能体基于视觉观测生成指令,由低层智能体执行,两者均通过奖励进行训练。我们的方法在模拟的56自由度人形机器人上,于8项任务中产生了高性能的控制策略,同时合成了人类普遍更偏好的运动。代码与视频:https://nicklashansen.com/rlpuppeteer