We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting.
翻译:我们研究学习执行多阶段机器人操作任务的问题,并将其应用于电缆布线场景——机器人需将电缆穿过一系列线夹。该场景体现了复杂多阶段机器人操作的典型挑战:处理可变形物体、视觉感知闭环控制、以及需要连续成功执行多个步骤才能完成的扩展行为。在此类设置中,若要求每个阶段的基础动作控制器具备足够高的成功率以完成整个长时域任务,则面临现实困境:若每个阶段必须成功完成且存在不可忽略的失败概率,则整个任务的成功概率将趋于零。因此,面向此类多阶段任务的控制器必须具备从失败中恢复的能力,通过智能决策选择当前时刻应触发的控制器、进行重试或采取必要修正措施,以补偿低级控制器的不足。为此,我们提出一种模仿学习系统,在低级(运动控制)与高级(动作序列编排)层面均采用基于视觉的策略模型,这些模型通过示范训练获得。我们详细阐述了将该方法应用于电缆布线任务的具体实现系统,并进行了实验评估,结果表明该方法在应对极具挑战性的线夹位置变化时展现出优异的泛化能力。相关补充视频、数据集及代码可通过 https://sites.google.com/view/cablerouting 获取。