Learning a single universal policy that can perform a diverse set of manipulation tasks is a promising new direction in robotics. However, existing techniques are limited to learning policies that can only perform tasks that are encountered during training, and require a large number of demonstrations to learn new tasks. Humans, on the other hand, often can learn a new task from a single unannotated demonstration. In this work, we propose the Invariance-Matching One-shot Policy Learning (IMOP) algorithm. In contrast to the standard practice of learning the end-effector's pose directly, IMOP first learns invariant regions of the state space for a given task, and then computes the end-effector's pose through matching the invariant regions between demonstrations and test scenes. Trained on the 18 RLBench tasks, IMOP achieves a success rate that outperforms the state-of-the-art consistently, by 4.5% on average over the 18 tasks. More importantly, IMOP can learn a novel task from a single unannotated demonstration, and without any fine-tuning, and achieves an average success rate improvement of $11.5\%$ over the state-of-the-art on 22 novel tasks selected across nine categories. IMOP can also generalize to new shapes and learn to manipulate objects that are different from those in the demonstration. Further, IMOP can perform one-shot sim-to-real transfer using a single real-robot demonstration.
翻译:学习一个能够执行多样化操作任务的通用策略是机器人学中一个前景广阔的新方向。然而,现有技术仅限于学习能够执行训练期间遇到的任务的策略,并且需要大量演示来学习新任务。相比之下,人类通常能够通过一次未标注的演示学会新任务。在本研究中,我们提出了不变性匹配单次策略学习算法。与直接学习末端执行器位姿的标准做法不同,IMOP首先学习给定任务状态空间中的不变区域,然后通过匹配演示场景与测试场景之间的不变区域来计算末端执行器的位姿。在18个RLBench任务上进行训练后,IMOP的成功率始终优于现有最优方法,在18个任务上平均高出4.5%。更重要的是,IMOP能够通过一次未标注的演示学习新任务,且无需任何微调,在跨越九个类别选取的22个新任务上,其平均成功率比现有最优方法提高了$11.5\%$。IMOP还能够泛化到新形状,并学会操作与演示中不同的物体。此外,IMOP能够利用单次真实机器人演示实现单次仿真到现实的迁移。