Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations - 专知论文

会员服务 ·

0

逆强化学习 · 演示 · 机器人 · 自适应 · 异质 ·

2023 年 3 月 29 日

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

翻译：基于示范的快速终身自适应逆向强化学习

Letian Chen,Sravan Jayanthi,Rohan Paleja,Daniel Martin,Viacheslav Zakharov,Matthew Gombolay

Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.

翻译：从示范中学习（LfD）方法使终端用户能够通过对期望行为的示范来教机器人新任务，从而推动机器人技术的普及。然而，当前的LfD框架既无法快速适应异构的人类示范，也无法在普适机器人应用中大规模部署。本文提出了一种新颖的LfD框架——快速终身自适应逆向强化学习（FLAIR）。我们的方法：（1）利用已习得策略构建策略混合体，以快速适应新示范，实现终端用户的快速个性化；（2）跨示范提取共同知识，实现精确的任务推断；（3）仅在终身部署需要时扩展模型，维护一组简洁的原型策略，并通过策略混合体逼近所有行为。实验验证表明，FLAIR实现了适应性（即机器人适应异构的、用户特定的任务偏好）、效率（即机器人实现样本高效的适应）和可扩展性（即模型随示范数量呈次线性增长，同时保持高性能）。在三个控制任务中，FLAIR超越基准方法，策略回报平均提高57%，使用策略混合体进行示范建模所需回合数平均减少78%。最后，我们在乒乓球任务中验证了FLAIR的成功，发现用户对FLAIR的任务表现（p<.05）和个性化表现（p<.05）评分更高。

0

相关内容

逆强化学习

逆强化学习

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICML2021】授权驱动探索的元强化学习

专知会员服务

28+阅读 · 2021年5月24日

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

43+阅读 · 2021年1月11日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

43+阅读 · 2020年4月11日

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

专知会员服务

38+阅读 · 2019年12月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICML2019 Tutorials】元学习：从小样本学习到快速强化学习(Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning)，Google Brain的研究科学家| Chelsea Finn，加州大学伯克利分校| Sergey Levine

【ICML2019 Tutorials】元学习：从小样本学习到快速强化学习(Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning)，Google Brain的研究科学家| Chelsea Finn，加州大学伯克利分校| Sergey Levine

专知会员服务

55+阅读 · 2019年6月10日

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

49+阅读 · 2020年8月28日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

【ICML2019】UC伯克利Pieter Abbeel教授强化学习教程-附59页slides

【ICML2019】UC伯克利Pieter Abbeel教授强化学习教程-附59页slides

专知

19+阅读 · 2019年6月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

软自适应无线视频传输的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

演化优化的自适应约束处理机理及在生化过程中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

e-Learner认知效率建模及自适应调整方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于贝叶斯联合模型的皮层脑机接口实现: 动作电位的实时检测、分类和解码

国家自然科学基金

0+阅读 · 2013年12月31日

移动Ad Hoc网络动态信任管理机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多目标免疫算法和数学规划的双行设备布局方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于非因果稳定逆的柔性机械臂学习控制

国家自然科学基金

0+阅读 · 2012年12月31日

量子系统在噪声环境下的控制和优化

国家自然科学基金

0+阅读 · 2011年12月31日

基于"非监督-监督-激励"集成学习模式的机器人行为自主学习系统研究

国家自然科学基金

1+阅读 · 2010年12月31日

Adaptive Graph Contrastive Learning for Recommendation

Arxiv

0+阅读 · 2023年5月18日

Adaptive Learning based Upper-Limb Rehabilitation Training System with Collaborative Robot

Arxiv

0+阅读 · 2023年5月18日

Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月17日

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Arxiv

0+阅读 · 2023年5月16日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

40+阅读 · 2021年8月30日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Arxiv

12+阅读 · 2020年12月14日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

VIP会员

文章信息

相关主题

逆强化学习

最新内容

从采集到决策：美军视角下的战术情报范式重构

从采集到决策：美军视角下的战术情报范式重构

专知会员服务

1+阅读 · 今天2:42

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

专知会员服务

1+阅读 · 今天2:37

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

专知会员服务

2+阅读 · 今天2:23

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

专知会员服务

5+阅读 · 今天2:21

《履带式无人地面战车技术发展现状》

《履带式无人地面战车技术发展现状》

专知会员服务

2+阅读 · 今天1:46

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

专知会员服务

5+阅读 · 8月1日

隐身技术前沿综述：物理机理、工程实践与战略展望

隐身技术前沿综述：物理机理、工程实践与战略展望

专知会员服务

4+阅读 · 8月1日

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

专知会员服务

3+阅读 · 8月1日

《以机反机：基于无人机载麦克风的空中周界入侵检测》

《以机反机：基于无人机载麦克风的空中周界入侵检测》

专知会员服务

4+阅读 · 8月1日

《无人机脆弱性利用：网络空间力量的新域》

《无人机脆弱性利用：网络空间力量的新域》

专知会员服务

2+阅读 · 8月1日

美空军如何将人工智能从战场部署至后方机关

美空军如何将人工智能从战场部署至后方机关

专知会员服务

11+阅读 · 7月31日

《美战争部指令文件：网络空间效应与使能能力测试评估》

《美战争部指令文件：网络空间效应与使能能力测试评估》

专知会员服务

8+阅读 · 7月31日

《史诗怒火行动：多域前瞻评估》49页报告

《史诗怒火行动：多域前瞻评估》49页报告

专知会员服务

7+阅读 · 7月31日

《英国防部：未来空战系统数字化战略》33页

《英国防部：未来空战系统数字化战略》33页

专知会员服务

5+阅读 · 7月31日

《面向自主飞行网络的智能体人工智能架构》

《面向自主飞行网络的智能体人工智能架构》

专知会员服务

7+阅读 · 7月31日

相关VIP内容

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICML2021】授权驱动探索的元强化学习

专知会员服务

28+阅读 · 2021年5月24日

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

43+阅读 · 2021年1月11日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

43+阅读 · 2020年4月11日

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

专知会员服务

38+阅读 · 2019年12月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICML2019 Tutorials】元学习：从小样本学习到快速强化学习(Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning)，Google Brain的研究科学家| Chelsea Finn，加州大学伯克利分校| Sergey Levine

【ICML2019 Tutorials】元学习：从小样本学习到快速强化学习(Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning)，Google Brain的研究科学家| Chelsea Finn，加州大学伯克利分校| Sergey Levine

专知会员服务

55+阅读 · 2019年6月10日

热门VIP内容

开通专知VIP会员享更多权益服务

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

从采集到决策：美军视角下的战术情报范式重构

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

相关资讯

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

49+阅读 · 2020年8月28日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

【ICML2019】UC伯克利Pieter Abbeel教授强化学习教程-附59页slides

【ICML2019】UC伯克利Pieter Abbeel教授强化学习教程-附59页slides

专知

19+阅读 · 2019年6月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Adaptive Graph Contrastive Learning for Recommendation

Arxiv

0+阅读 · 2023年5月18日

Adaptive Learning based Upper-Limb Rehabilitation Training System with Collaborative Robot

Arxiv

0+阅读 · 2023年5月18日

Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月17日

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Arxiv

0+阅读 · 2023年5月16日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

40+阅读 · 2021年8月30日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Arxiv

12+阅读 · 2020年12月14日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

相关基金

软自适应无线视频传输的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

演化优化的自适应约束处理机理及在生化过程中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

e-Learner认知效率建模及自适应调整方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于贝叶斯联合模型的皮层脑机接口实现: 动作电位的实时检测、分类和解码

国家自然科学基金

0+阅读 · 2013年12月31日

移动Ad Hoc网络动态信任管理机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多目标免疫算法和数学规划的双行设备布局方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于非因果稳定逆的柔性机械臂学习控制

国家自然科学基金

0+阅读 · 2012年12月31日

量子系统在噪声环境下的控制和优化

国家自然科学基金

0+阅读 · 2011年12月31日

基于"非监督-监督-激励"集成学习模式的机器人行为自主学习系统研究

国家自然科学基金

1+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员