Multi-task learning of deformable object manipulation is a challenging problem in robot manipulation. Most previous works address this problem in a goal-conditioned way and adapt goal images to specify different tasks, which limits the multi-task learning performance and can not generalize to new tasks. Thus, we adapt language instruction to specify deformable object manipulation tasks and propose a learning framework. We first design a unified Transformer-based architecture to understand multi-modal data and output picking and placing action. Besides, we have introduced the visible connectivity graph to tackle nonlinear dynamics and complex configuration of the deformable object. Both simulated and real experiments have demonstrated that the proposed method is effective and can generalize to unseen instructions and tasks. Compared with the state-of-the-art method, our method achieves higher success rates (87.2% on average) and has a 75.6% shorter inference time. We also demonstrate that our method performs well in real-world experiments.
翻译:可变形物体操控的多任务学习是机器人操控领域中的一项挑战性问题。以往的大多数工作采用目标条件化的方式,通过适应目标图像来指定不同任务,这限制了多任务学习性能,且无法泛化到新任务。为此,我们采用语言指令来指定可变形物体操控任务,并提出了一种学习框架。我们首先设计了一种基于Transformer的统一架构,用于理解多模态数据并输出抓取与放置动作。此外,我们引入了可见连通性图来处理可变形物体的非线性动力学和复杂构型。仿真实验和真实实验均表明,所提方法有效且能泛化到未见过的指令和任务。与最先进方法相比,我们的方法取得了更高的成功率(平均87.2%),并缩短了75.6%的推理时间。我们还证明了该方法在真实场景实验中表现良好。