Demonstrations and natural language instructions are two common ways to specify and teach robots novel tasks. However, for many complex tasks, a demonstration or language instruction alone contains ambiguities, preventing tasks from being specified clearly. In such cases, a combination of both a demonstration and an instruction more concisely and effectively conveys the task to the robot than either modality alone. To instantiate this problem setting, we train a single multi-task policy on a few hundred challenging robotic pick-and-place tasks and propose DeL-TaCo (Joint Demo-Language Task Conditioning), a method for conditioning a robotic policy on task embeddings comprised of two components: a visual demonstration and a language instruction. By allowing these two modalities to mutually disambiguate and clarify each other during novel task specification, DeL-TaCo (1) substantially decreases the teacher effort needed to specify a new task and (2) achieves better generalization performance on novel objects and instructions over previous task-conditioning methods. To our knowledge, this is the first work to show that simultaneously conditioning a multi-task robotic manipulation policy on both demonstration and language embeddings improves sample efficiency and generalization over conditioning on either modality alone. See additional materials at https://deltaco-robot.github.io/
翻译:摘要:示范和自然语言指令是两种常见的向机器人指定和教授新任务的方式。然而,对于许多复杂任务,仅凭示范或语言指令本身存在歧义,难以清晰明确地指定任务。在此类情形下,结合示范与指令能比单独使用任一模态更简洁、更有效地向机器人传达任务。为实例化这一问题设定,我们在数百项具有挑战性的机器人拾取与放置任务上训练了一个单一多任务策略,并提出了DeL-TaCo(联合示范-语言任务调节)方法——一种将机器人策略调节到由视觉示范和语言指令两个组成部分构成的任务嵌入上的方法。通过在指定新任务时让这两种模态相互消歧并澄清对方,DeL-TaCo(1)显著降低了教师指定新任务所需的工作量,并且(2)在面向新物体和指令时,相比以往的任务调节方法展现出更好的泛化性能。据我们所知,这是首项工作证明:在多任务机器人操作策略上同时基于示范嵌入和语言嵌入进行调节,能够比单独基于其中任一模态进行调节提升样本效率与泛化能力。更多材料请见 https://deltaco-robot.github.io/