Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.
翻译:使机器人能够以数据高效的方式学习新颖的视觉运动技能仍是一个尚未解决且充满挑战的问题。解决这一问题的常用范式是利用包含多种行为的大型未标注数据集,然后通过少量特定任务的人工监督(如干预或示范)将策略适配到具体任务。然而,如何最优地利用有限的特定任务监督并平衡其与离线数据的关系仍是一个开放性问题。本文的核心见解在于:任务特定数据不仅为智能体提供了新的训练数据,还能指示智能体应使用何种类型的先验数据进行学习。具体而言,我们提出一种简单方法:利用少量下游专家数据,从离线未标注数据集(包含大量次优行为)中选择性查询相关行为。随后,智能体在专家数据和查询数据上联合训练。实验表明,我们的方法能够仅查询与任务相关的转换,过滤掉次优或无关数据。通过这种方式,与简单混合数据或仅使用任务特定数据相比,该方法能更有效地从任务特定数据与离线数据的组合中学习。此外,我们发现这种简单的查询方法在基于图像的模拟和真实机器人操作任务上,比更复杂的目标条件方法性能提升20%。视频和代码请参见 https://sites.google.com/view/behaviorretrieval。