We propose to learn legged robot locomotion skills by watching thousands of wild animal videos from the internet, such as those featured in nature documentaries. Indeed, such videos offer a rich and diverse collection of plausible motion examples, which could inform how robots should move. To achieve this, we introduce Reinforcement Learning from Wild Animal Videos (RLWAV), a method to ground these motions into physical robots. We first train a video classifier on a large-scale animal video dataset to recognize actions from RGB clips of animals in their natural habitats. We then train a multi-skill policy to control a robot in a physics simulator, using the classification score of a third-person camera capturing videos of the robot's movements as a reward for reinforcement learning. Finally, we directly transfer the learned policy to a real quadruped Solo. Remarkably, despite the extreme gap in both domain and embodiment between animals in the wild and robots, our approach enables the policy to learn diverse skills such as walking, jumping, and keeping still, without relying on reference trajectories nor skill-specific rewards.
翻译:我们提出通过观看互联网上数千个野生动物视频(例如自然纪录片中的视频)来学习腿式机器人的运动技能。这些视频确实提供了丰富多样的合理运动示例,可以为机器人的运动方式提供参考。为实现这一目标,我们提出了"从野生动物视频中学习强化学习"(RLWAV)方法,将这些运动模式落地到物理机器人中。我们首先在一个大规模的动物视频数据集上训练视频分类器,以识别自然栖息地中动物RGB视频片段中的动作。随后,我们在物理模拟器中训练多技能策略来控制机器人,使用捕捉机器人运动视频的第三人称摄像头的分类分数作为强化学习的奖励。最后,我们将学习到的策略直接迁移到真实的四足机器人Solo上。值得注意的是,尽管野生动物与机器人在领域和形态上都存在巨大差异,我们的方法仍能使策略学习到行走、跳跃和保持静止等多种技能,且无需依赖参考轨迹或特定技能奖励。