We aim to control a robot to physically behave in the real world following any high-level language command like "cartwheel" or "kick. " Although human motion datasets exist, this task remains particularly challenging since generative models can produce physically unrealistic motions, which will be more severe for robots due to different body structures and physical properties. In addition, to control a physical robot to perform a desired motion, a control policy must be learned. We develop LAnguage-Guided mOtion cONtrol (LAGOON), a multi-phase method to generate physically realistic robot motions under language commands. LAGOON first leverages a pre-trained model to generate human motion from a language command. Then an RL phase is adopted to train a control policy in simulation to mimic the generated human motion. Finally, with domain randomization, we show that our learned policy can be successfully deployed to a quadrupedal robot, leading to a robot dog that can stand up and wave its front legs in the real world to mimic the behavior of a hand-waving human.
翻译:我们旨在控制机器人在真实世界中按照“翻跟头”或“踢腿”等高级语言指令执行物理行为。尽管存在人体运动数据集,但该任务仍极具挑战性,因为生成模型可能产生物理不真实的运动,而由于机器人体结构和物理特性不同,这一问题将更为严重。此外,要控制物理机器人执行期望运动,必须学习控制策略。我们提出语言引导的运动控制(LAGOON)方法,这是一种多阶段方法,用于在语言指令下生成物理真实的机器人运动。LAGOON首先利用预训练模型从语言指令生成人体运动,然后采用强化学习阶段在仿真中训练控制策略以模仿生成的人体运动。最后,通过域随机化,我们证明所学策略可成功部署至四足机器人,使机器狗在真实世界中能站立并挥动前腿,模仿挥手人类的行为。