Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets - 专知论文

会员服务 ·

0

未标记 · 数据集 · 模仿学习 · 混合数据 · 混合 ·

2023 年 4 月 18 日

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

翻译：行为检索：通过查询未标注数据集进行少样本模仿学习

Maximilian Du,Suraj Nair,Dorsa Sadigh,Chelsea Finn

Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.

翻译：使机器人能够以数据高效的方式学习新颖的视觉运动技能仍是一个尚未解决且充满挑战的问题。解决这一问题的常用范式是利用包含多种行为的大型未标注数据集，然后通过少量特定任务的人工监督（如干预或示范）将策略适配到具体任务。然而，如何最优地利用有限的特定任务监督并平衡其与离线数据的关系仍是一个开放性问题。本文的核心见解在于：任务特定数据不仅为智能体提供了新的训练数据，还能指示智能体应使用何种类型的先验数据进行学习。具体而言，我们提出一种简单方法：利用少量下游专家数据，从离线未标注数据集（包含大量次优行为）中选择性查询相关行为。随后，智能体在专家数据和查询数据上联合训练。实验表明，我们的方法能够仅查询与任务相关的转换，过滤掉次优或无关数据。通过这种方式，与简单混合数据或仅使用任务特定数据相比，该方法能更有效地从任务特定数据与离线数据的组合中学习。此外，我们发现这种简单的查询方法在基于图像的模拟和真实机器人操作任务上，比更复杂的目标条件方法性能提升20%。视频和代码请参见 https://sites.google.com/view/behaviorretrieval。

0

相关内容

未标记

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

专知会员服务

18+阅读 · 2022年2月26日

如何用好对比学习？CVPR2021谷歌ChenTing《自监督视觉表示学习》报告，附视频与Slides

如何用好对比学习？CVPR2021谷歌ChenTing《自监督视觉表示学习》报告，附视频与Slides

专知会员服务

38+阅读 · 2021年6月21日

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

专知会员服务

66+阅读 · 2021年1月10日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【EMNLP2020】开放领域对话的数据增广的方法：“对话蒸馏”

【EMNLP2020】开放领域对话的数据增广的方法：“对话蒸馏”

专知会员服务

30+阅读 · 2020年9月29日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

67+阅读 · 2020年8月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知会员服务

113+阅读 · 2019年12月13日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知

17+阅读 · 2019年12月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

泡泡机器人SLAM

20+阅读 · 2019年4月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

数据驱动下矿渣微粉生产过程的智能控制

国家自然科学基金

0+阅读 · 2014年12月31日

微尺度液滴在高太阳辐射强度下耦合换热性能机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

拉伸变形过程中非晶合金纳米尺度结构不均匀性的演化

国家自然科学基金

0+阅读 · 2013年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

棉花根区盐分差异分布减轻盐害的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

基于动态激光散斑方法的纳米流体中纳米颗粒状态和行为的光学表征

国家自然科学基金

0+阅读 · 2011年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

Cooperative Open-ended Learning Framework for Zero-shot Coordination

Arxiv

0+阅读 · 2023年5月31日

Joint Adaptive Representations for Image-Language Learning

Arxiv

0+阅读 · 2023年5月31日

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

Arxiv

0+阅读 · 2023年5月31日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

23+阅读 · 2021年9月23日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

40+阅读 · 2021年8月30日

Federated Learning Meets Natural Language Processing: A Survey

Arxiv

19+阅读 · 2021年7月27日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

2+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

1+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

4+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

3+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

13+阅读 · 6月16日

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

4+阅读 · 6月16日

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

8+阅读 · 6月16日

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

20+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

8+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

9+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

9+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

9+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

8+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

相关VIP内容

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

专知会员服务

18+阅读 · 2022年2月26日

如何用好对比学习？CVPR2021谷歌ChenTing《自监督视觉表示学习》报告，附视频与Slides

如何用好对比学习？CVPR2021谷歌ChenTing《自监督视觉表示学习》报告，附视频与Slides

专知会员服务

38+阅读 · 2021年6月21日

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

专知会员服务

66+阅读 · 2021年1月10日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【EMNLP2020】开放领域对话的数据增广的方法：“对话蒸馏”

【EMNLP2020】开放领域对话的数据增广的方法：“对话蒸馏”

专知会员服务

30+阅读 · 2020年9月29日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

67+阅读 · 2020年8月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知会员服务

113+阅读 · 2019年12月13日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

多模态代码智能综述：从视觉输入到可执行代码系统

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知

17+阅读 · 2019年12月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

泡泡机器人SLAM

20+阅读 · 2019年4月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Cooperative Open-ended Learning Framework for Zero-shot Coordination

Arxiv

0+阅读 · 2023年5月31日

Joint Adaptive Representations for Image-Language Learning

Arxiv

0+阅读 · 2023年5月31日

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

Arxiv

0+阅读 · 2023年5月31日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

23+阅读 · 2021年9月23日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

40+阅读 · 2021年8月30日

Federated Learning Meets Natural Language Processing: A Survey

Arxiv

19+阅读 · 2021年7月27日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

相关基金

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

数据驱动下矿渣微粉生产过程的智能控制

国家自然科学基金

0+阅读 · 2014年12月31日

微尺度液滴在高太阳辐射强度下耦合换热性能机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

拉伸变形过程中非晶合金纳米尺度结构不均匀性的演化

国家自然科学基金

0+阅读 · 2013年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

棉花根区盐分差异分布减轻盐害的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

基于动态激光散斑方法的纳米流体中纳米颗粒状态和行为的光学表征

国家自然科学基金

0+阅读 · 2011年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员