We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting.
翻译:我们研究了学习执行多阶段机器人操作任务的问题,并将其应用于线缆布线场景——机器人必须将线缆穿过一系列夹扣。该设置体现了复杂多阶段机器人操作场景的典型挑战:处理可变形物体、闭环视觉感知,以及执行由多个步骤组成的扩展行为,这些步骤必须成功完成才能实现整个任务。在此类场景中,为每个阶段学习具有足够高成功率的单个基元以执行完整的时间扩展任务是不切实际的:如果每个阶段必须成功完成且存在不可忽略的失败概率,那么整个任务的成功完成概率将变得微乎其微。因此,应对此类多阶段任务的控制器必须能够从失败中恢复,并通过智能选择在任意时刻触发哪些控制器、重试或在必要时采取纠正措施来补偿低级控制器的缺陷。为此,我们描述了一个模仿学习系统,该系统使用基于视觉的策略,这些策略在低级(电机控制)和高级(序列控制)层面均从示范中训练得到。我们提出了一个实例化该方法以学习线缆布线任务的系统,并通过评估展示了其在应对极具挑战性的夹扣位置变化时优异的泛化性能。补充视频、数据集和代码可在 https://sites.google.com/view/cablerouting 获取。