CrossLoco: Human Motion Driven Control of Legged Robots via Guided Unsupervised Reinforcement Learning

Human motion driven control (HMDC) is an effective approach for generating natural and compelling robot motions while preserving high-level semantics. However, establishing the correspondence between humans and robots with different body structures is not straightforward due to the mismatches in kinematics and dynamics properties, which causes intrinsic ambiguity to the problem. Many previous algorithms approach this motion retargeting problem with unsupervised learning, which requires the prerequisite skill sets. However, it will be extremely costly to learn all the skills without understanding the given human motions, particularly for high-dimensional robots. In this work, we introduce CrossLoco, a guided unsupervised reinforcement learning framework that simultaneously learns robot skills and their correspondence to human motions. Our key innovation is to introduce a cycle-consistency-based reward term designed to maximize the mutual information between human motions and robot states. We demonstrate that the proposed framework can generate compelling robot motions by translating diverse human motions, such as running, hopping, and dancing. We quantitatively compare our CrossLoco against the manually engineered and unsupervised baseline algorithms along with the ablated versions of our framework and demonstrate that our method translates human motions with better accuracy, diversity, and user preference. We also showcase its utility in other applications, such as synthesizing robot movements from language input and enabling interactive robot control.

翻译：摘要：人体动作驱动控制（HMDC）是一种在保留高层语义的同时生成自然且富有吸引力的机器人动作的有效方法。然而，由于机器人与人体在运动学和动力学特性上存在不匹配，建立不同身体结构的人机对应关系并非易事，这导致该问题存在固有的模糊性。以往许多算法采用无监督学习方法处理这种动作重定向问题，但需要预先具备技能集。然而，在不理解给定人体动作的情况下学习所有技能，对于高维机器人而言代价极高。本文提出CrossLoco——一种引导式无监督强化学习框架，可同时学习机器人技能及其与人体动作的对应关系。我们的核心创新在于引入基于循环一致性（cycle-consistency）的奖励项，旨在最大化人体动作与机器人状态之间的互信息。实验表明，该框架能够通过翻译跑步、跳跃、舞蹈等多种人体动作生成富有吸引力的机器人动作。我们通过与人工设计基线、无监督基线算法以及消融版本的定量对比证明，CrossLoco在翻译人体动作时具有更高的精度、多样性和用户偏好度。此外，我们还展示了该方法在语言输入合成机器人动作、实现交互式机器人控制等其他应用中的实用性。