Recent advances in robot learning have shown promise in enabling robots to perform a variety of manipulation tasks and generalize to novel scenarios. One of the key contributing factors to this progress is the scale of robot data used to train the models. To obtain large-scale datasets, prior approaches have relied on either demonstrations requiring high human involvement or engineering-heavy autonomous data collection schemes, both of which are challenging to scale. To mitigate this issue, we propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing to obtain meaningful data for robot learning without requiring additional robot data. We term our method Robot Learning with Semantically Imagened Experience (ROSIE). Specifically, we make use of the state of the art text-to-image diffusion models and perform aggressive data augmentation on top of our existing robotic manipulation datasets via inpainting various unseen objects for manipulation, backgrounds, and distractors with text guidance. Through extensive real-world experiments, we show that manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors. In addition, we find that we can improve the robustness and generalization of high-level robot learning tasks such as success detection through training with the diffusion-based data augmentation. The project's website and videos can be found at diffusion-rosie.github.io
翻译:近期机器人学习领域的进展显示出使机器人执行多种操作任务并推广到新场景的潜力。推动这一进展的关键因素之一是用于训练模型的机器人数据规模。为获取大规模数据集,先前方法要么依赖需要高度人类参与的演示,要么依赖工程繁重的自主数据收集方案,两者都难以扩展。为缓解此问题,我们提出替代路径,利用计算机视觉和自然语言处理中广泛使用的文本到图像基础模型,无需额外机器人数据即可获得有意义的机器人学习数据。我们将该方法命名为“语义想象经验驱动的机器人学习”(ROSIE)。具体而言,我们利用先进的文本到图像扩散模型,通过文本引导下的修补(inpainting)对现有机器人操作数据集进行激进数据增强,引入各种未见过的操作物体、背景和干扰物。通过大量真实世界实验,我们证明:经此增强数据训练的操作策略能够解决涉及新物体的完全未见任务,并对新型干扰物表现出更强鲁棒性。此外,我们发现通过基于扩散模型的数据增强训练,可提升高级机器人学习任务(如成功检测)的鲁棒性和泛化能力。项目网站和视频详见diffusion-rosie.github.io