We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting.
翻译:我们研究学习执行多阶段机器人操作任务的问题,并将其应用于电缆布线场景,其中机器人必须将电缆穿过一系列夹扣。这一设置体现了复杂多阶段机器人操作场景的典型挑战:处理可变形物体、闭环视觉感知、以及处理由多个步骤组成的扩展行为,这些步骤必须成功执行才能完成整个任务。在此类场景中,为每个阶段学习具有足够成功率的独立基元以完成时域扩展任务是不切实际的:如果每个阶段必须成功完成且存在不可忽略的失败概率,则整个任务的成功概率将变得微乎其微。因此,成功完成此类多阶段任务的控制器必须能够从失败中恢复,并通过智能地选择任意时刻触发的控制器、重试或采取必要的纠正措施来弥补低层控制器的缺陷。为此,我们描述了一个模仿学习系统,该系统在底层(运动控制)和顶层(序列规划)均使用基于视觉的策略(该策略从演示中训练而来)。我们提出了一个实例化该方法的系统以学习电缆布线任务,并通过评估展示了其在泛化至极具挑战性的夹扣位置变化中的卓越性能。补充视频、数据集和代码可在 https://sites.google.com/view/cablerouting 获取。