Imitation learning with visual observations is notoriously inefficient when addressed with end-to-end behavioural cloning methods. In this paper, we explore an alternative paradigm which decomposes reasoning into three phases. First, a retrieval phase, which informs the robot what it can do with an object. Second, an alignment phase, which informs the robot where to interact with the object. And third, a replay phase, which informs the robot how to interact with the object. Through a series of real-world experiments on everyday tasks, such as grasping, pouring, and inserting objects, we show that this decomposition brings unprecedented learning efficiency, and effective inter- and intra-class generalisation. Videos are available at https://www.robot-learning.uk/retrieval-alignment-replay.
翻译:基于视觉观测的模仿学习在使用端到端行为克隆方法时通常效率低下。本文探索了一种替代范式,将推理过程分解为三个阶段:首先是检索阶段,告知机器人它能对物体执行何种操作;其次是对齐阶段,告知机器人应在何处与物体交互;最后是重放阶段,告知机器人应如何与物体交互。通过在日常生活任务(如抓取、倾倒和插入物体)上的一系列真实世界实验,我们证明这种分解带来了前所未有的学习效率以及有效的类间与类内泛化能力。视频参见https://www.robot-learning.uk/retrieval-alignment-replay。