In-context imitation learning (ICIL) enables robots to learn new tasks from a small number of demonstrations by conditioning a pre-trained policy on task-specific examples, without retraining at test time. Despite this promise, training generalizable and scalable in-context imitation policies remains an open challenge. We present SynthICL, a scalable framework that trains ICIL policies entirely from RGB-only synthetic data. Specifically, we build a data generation pipeline to produce high-fidelity ICIL data and train a flow-matching transformer policy on the resulting dataset. SynthICL avoids the need for depth sensing, precise camera calibration, and real-world training data in prior approaches, offering a simpler and more scalable alternative. We further incorporate subgoal prediction by training the model to predict the next subgoal images, enabling more precise and visually grounded control. Evaluated on 16 unseen real-world manipulation tasks, SynthICL achieves an average success rate of 79% with only one demonstration provided at test time and outperforms prior methods. Project page: https://synth-icl.github.io
翻译:上下文模仿学习(ICIL)使机器人能够通过将预训练策略条件化于任务特定示例,在不进行测试时重训练的情况下,仅凭少量演示学习新任务。尽管前景广阔,训练可泛化且可扩展的上下文模仿策略仍是一项开放挑战。我们提出SynthICL——一个完全基于RGB合成数据训练ICIL策略的可扩展框架。具体而言,我们构建了数据生成流水线以产生高保真ICIL数据,并在所得数据集上训练基于流匹配的Transformer策略。SynthICL避免了先前方法对深度感知、精确相机标定及真实世界训练数据的依赖,提供了一种更简单且可扩展的替代方案。我们进一步融入子目标预测能力——训练模型预测下一阶段子目标图像,从而实现更精准、基于视觉的控制。在16项未见过的真实世界操作任务评估中,SynthICL在测试时仅需提供一次演示即可实现79%的平均成功率,且优于先前方法。项目页面:https://synth-icl.github.io