Continual Robot Learning using Self-Supervised Task Inference

Endowing robots with the human ability to learn a growing set of skills over the course of a lifetime as opposed to mastering single tasks is an open problem in robot learning. While multi-task learning approaches have been proposed to address this problem, they pay little attention to task inference. In order to continually learn new tasks, the robot first needs to infer the task at hand without requiring predefined task representations. In this paper, we propose a self-supervised task inference approach. Our approach learns action and intention embeddings from self-organization of the observed movement and effect parts of unlabeled demonstrations and a higher-level behavior embedding from self-organization of the joint action-intention embeddings. We construct a behavior-matching self-supervised learning objective to train a novel Task Inference Network (TINet) to map an unlabeled demonstration to its nearest behavior embedding, which we use as the task representation. A multi-task policy is built on top of the TINet and trained with reinforcement learning to optimize performance over tasks. We evaluate our approach in the fixed-set and continual multi-task learning settings with a humanoid robot and compare it to different multi-task learning baselines. The results show that our approach outperforms the other baselines, with the difference being more pronounced in the challenging continual learning setting, and can infer tasks from incomplete demonstrations. Our approach is also shown to generalize to unseen tasks based on a single demonstration in one-shot task generalization experiments.

翻译：赋予机器人人类般终身学习不断增长的技能集而非仅掌握单一任务的能力，是机器人学习领域的一个开放性问题。虽然多任务学习方法已被提出以解决该问题，但它们对任务推理关注甚少。为了持续学习新任务，机器人首先需要在不依赖预定义任务表征的情况下推断当前任务。本文提出一种自监督任务推理方法。该方法通过自组织未标注示范中观察到的运动部分与效应部分，学习动作嵌入与意图嵌入，并通过自组织联合动作-意图嵌入来学习更高层次的行为嵌入。我们构建了行为匹配自监督学习目标，训练新型任务推理网络（TINet）将未标注示范映射至最近的行为嵌入，并将其用作任务表征。基于TINet构建多任务策略，采用强化学习训练以优化跨任务性能。我们在固定任务集和持续多任务学习场景中，使用仿人机器人评估该方法，并与多种多任务学习基线进行对比。结果表明，我们的方法优于其他基线，在更具挑战性的持续学习场景中差异更为显著，且能从不完整示范中推断任务。在单次任务泛化实验中，该方法还能基于单条示范泛化至未见过的任务。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日