Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to train performant transformer-based policies. In this paper, we present a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving 70 to 80% success rates. Video results and code at https://mihdalal.github.io/optimus/
翻译:模仿学习是训练机器人操作策略的强大工具,能够使机器人从专家演示中学习,无需手动编程或试错。然而,常见的数据收集方法(如人工监督)由于耗时费力,扩展性较差。相比之下,任务与运动规划(TAMP)可以自主生成大规模多样化演示数据集。本工作表明,TAMP监督器生成的大规模数据集与灵活适配这些数据的Transformer模型相结合,是机器人操作的一种有效范式。为此,我们提出了一种名为OPTIMUS的新型模仿学习系统,通过模仿TAMP智能体训练大规模视觉运动Transformer策略。OPTIMUS引入了一套专为模仿学习定制的TAMP数据生成流程,可用于训练高性能的基于Transformer的策略。本文对模仿TAMP所需的设计决策进行了深入研究,并证明OPTIMUS能够解决涉及70多种不同物体的各种具有挑战性的基于视觉的操作任务,包括长时程抓取放置任务、货架操作以及铰接式物体操作,成功率达到70%到80%。视频结果与代码请访问https://mihdalal.github.io/optimus/