We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.
翻译:我们提出了SLoMo:首个能够将人类和动物“野外”随意拍摄视频中的熟练动作迁移至足式机器人的框架。SLoMo包含三个阶段:1)从单目视频中合成物理可解释的重构关键点轨迹;2)离线优化机器人动态可行的参考轨迹,包含身体与足部运动,以及紧密跟踪关键点的接触序列;3)在机器人硬件上利用通用模型预测控制器在线跟踪参考轨迹。传统足式运动技能的模仿通常需要专业动画师、协作演示和/或昂贵的动作捕捉设备,这些均限制了可扩展性。相反,SLoMo仅依赖易于获取的单目视频素材(如YouTube等在线资源库中的内容),将视频转化为可由真实机器人可靠执行的运动基元。我们通过将猫、狗和人类的动作迁移至四足机器人(硬件实验)和仿人机器人(仿真实验)展示了该方法的有效性。据作者所知,这是首个无需人工标记或标签、直接从日常视频中模仿动物和人类动作的通用足式机器人运动迁移框架。