Following the impressive capabilities of in-context learning with large transformers, In-Context Imitation Learning (ICIL) is a promising opportunity for robotics. We introduce Instant Policy, which learns new tasks instantly (without further training) from just one or two demonstrations, achieving ICIL through two key components. First, we introduce inductive biases through a graph representation and model ICIL as a graph generation problem with a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions. Second, we show that such a model can be trained using pseudo-demonstrations - arbitrary trajectories generated in simulation - as a virtually infinite pool of training data. Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks. Code and videos are available at https://www.robot-learning.uk/instant-policy.
翻译:随着大型Transformer在上下文学习中展现出令人瞩目的能力,上下文模仿学习(ICIL)为机器人学带来了广阔前景。本文提出即时策略模型,该模型能够仅通过一至两次演示即刻学习新任务(无需额外训练),并通过两个关键组件实现ICIL。首先,我们通过图表示引入归纳偏置,将ICIL建模为基于学习扩散过程的图生成问题,从而实现对演示、观测与动作的结构化推理。其次,我们证明此类模型可通过伪演示(在仿真环境中生成的任意轨迹)进行训练,这些伪演示构成了近乎无限的训练数据池。仿真与真实实验表明,即时策略能够快速学习多种日常机器人任务。我们还展示了该模型如何作为跨具身智能的基础框架,实现面向语言定义任务的零样本迁移。代码与视频资源详见:https://www.robot-learning.uk/instant-policy。