Legged locomotion over various terrains is challenging and requires precise perception of the robot and its surroundings from both proprioception and vision. However, learning directly from high-dimensional visual input is often data-inefficient and intricate. To address this issue, traditional methods attempt to learn a teacher policy with access to privileged information first and then learn a student policy to imitate the teacher's behavior with visual input. Despite some progress, this imitation framework prevents the student policy from achieving optimal performance due to the information gap between inputs. Furthermore, the learning process is unnatural since animals intuitively learn to traverse different terrains based on their understanding of the world without privileged knowledge. Inspired by this natural ability, we propose a simple yet effective method, World Model-based Perception (WMP), which builds a world model of the environment and learns a policy based on the world model. We illustrate that though completely trained in simulation, the world model can make accurate predictions of real-world trajectories, thus providing informative signals for the policy controller. Extensive simulated and real-world experiments demonstrate that WMP outperforms state-of-the-art baselines in traversability and robustness. Videos and Code are available at: https://wmp-loco.github.io/.
翻译:在各种地形上进行腿式运动具有挑战性,需要从本体感觉和视觉两方面精确感知机器人及其周围环境。然而,直接从高维视觉输入中学习通常数据效率低下且过程复杂。为解决此问题,传统方法尝试先学习一个能够访问特权信息的教师策略,然后学习一个学生策略,使其在视觉输入下模仿教师的行为。尽管取得了一些进展,但由于输入信息之间的差距,这种模仿框架阻碍了学生策略达到最优性能。此外,学习过程并不自然,因为动物是凭借其对世界的理解(而非特权知识)来直观地学习穿越不同地形的。受这种自然能力的启发,我们提出了一种简单而有效的方法——基于世界模型的感知(WMP)。该方法构建环境的世界模型,并基于该世界模型学习策略。我们证明,尽管完全在仿真中训练,该世界模型仍能准确预测真实世界的轨迹,从而为策略控制器提供信息丰富的信号。大量的仿真和真实世界实验表明,WMP在可穿越性和鲁棒性方面优于最先进的基线方法。视频和代码可在以下网址获取:https://wmp-loco.github.io/。