While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisely grasping, reorienting, and inserting multiple parts over long horizons and multiple task phases. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. These help expand dataset support and supervise the model with locally corrective actions near bottleneck regions requiring high precision. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps directly from RGB images, outperforming imitation and data augmentation baselines. Project website: https://imitation-juicer.github.io/.
翻译:尽管通过演示学习对于获取视觉运动策略非常有效,但在无需大量演示数据集的情况下实现高性能模仿,对于需要精确、长时程操作的任务仍然具有挑战性。本文提出了一种流程,旨在利用少量人类演示预算来提升模仿学习的性能。我们将该方法应用于需要长时间跨度和多个任务阶段内精确抓取、重定向和插入多个零件的装配任务。我们的流程结合了表达能力强的策略架构,以及多种用于数据集扩展和基于仿真的数据增强技术。这些技术有助于扩展数据集的支持范围,并在需要高精度的瓶颈区域附近,通过局部纠正性动作来监督模型。我们在仿真环境中对四个家具装配任务进行了流程验证,使机械臂能够直接从RGB图像出发,在近2500个时间步内组装多达五个零件,其性能超越了模仿学习和数据增强的基线方法。项目网站:https://imitation-juicer.github.io/。