Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control - 专知论文

会员服务 ·

0

逆强化学习 · Learning · 控制器 · 端到端 · 强化学习 ·

2023 年 5 月 23 日

Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control

翻译：选项感知的对抗性逆强化学习在机器人控制中的应用

Jiayu Chen,Tian Lan,Vaneet Aggarwal

Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or cannot learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.

翻译：层次化模仿学习（HIL）通过利用选项框架建模任务层次结构，从专家示范中恢复长时域任务中的高度复杂行为。现有方法要么忽略了子任务与其对应策略之间的因果关系，要么无法以端到端的方式学习策略，从而导致次优性。本文基于对抗性逆强化学习提出了一种新颖的HIL算法，并采用期望最大化算法进行适配，以直接从未标注示范中恢复层次化策略。此外，我们在目标函数中引入了一项有向信息项以增强因果关系，并提出了一个变分自编码器框架，以端到端的方式基于目标函数进行学习。我们提供了理论证明及在具有挑战性的机器人控制任务上的评估，展示了该算法的优越性。代码请见 https://github.com/LucasCJYSDL/HierAIRL。

0

相关内容

逆强化学习

逆强化学习

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非均质量子器件Schr？dinger-Poisson系统多尺度分析与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

单链saRNA加工和抑制效率的研究

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

具有状态约束的Navier-Stokes方程的最优控制问题

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Arxiv

0+阅读 · 2023年7月10日

A User Study on Explainable Online Reinforcement Learning for Adaptive Systems

Arxiv

0+阅读 · 2023年7月9日

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Arxiv

0+阅读 · 2023年7月7日

Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

Arxiv

0+阅读 · 2023年7月7日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

VIP会员

文章信息

相关主题

逆强化学习

最新内容

ICML 2026 | SARDI：扩散语言模型的自增强检索

ICML 2026 | SARDI：扩散语言模型的自增强检索

专知会员服务

5+阅读 · 6月6日

长时程具身智能安全综述：机器人操作的跨层分析

长时程具身智能安全综述：机器人操作的跨层分析

专知会员服务

5+阅读 · 6月6日

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

专知会员服务

11+阅读 · 6月6日

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

专知会员服务

7+阅读 · 6月6日

《国防领域安全采用大语言模型的战略蓝图》

《国防领域安全采用大语言模型的战略蓝图》

专知会员服务

8+阅读 · 6月6日

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

专知会员服务

8+阅读 · 6月6日

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

专知会员服务

6+阅读 · 6月6日

ICML 2026 | 演化选择的因果建模

ICML 2026 | 演化选择的因果建模

专知会员服务

8+阅读 · 6月5日

综述｜学习式3D表征最新进展与趋势

综述｜学习式3D表征最新进展与趋势

专知会员服务

7+阅读 · 6月5日

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

专知会员服务

8+阅读 · 6月5日

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

专知会员服务

7+阅读 · 6月5日

人工智能重塑威慑：算法优势的兴起

人工智能重塑威慑：算法优势的兴起

专知会员服务

8+阅读 · 6月5日

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

专知会员服务

14+阅读 · 6月4日

AgentOps综述：智能体系统运维框架

AgentOps综述：智能体系统运维框架

专知会员服务

17+阅读 · 6月4日

《美陆军最新条令：兵力防护》

《美陆军最新条令：兵力防护》

专知会员服务

14+阅读 · 6月4日

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

长时程具身智能安全综述：机器人操作的跨层分析

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

ICML 2026 | SARDI：扩散语言模型的自增强检索

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Arxiv

0+阅读 · 2023年7月10日

A User Study on Explainable Online Reinforcement Learning for Adaptive Systems

Arxiv

0+阅读 · 2023年7月9日

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Arxiv

0+阅读 · 2023年7月7日

Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

Arxiv

0+阅读 · 2023年7月7日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非均质量子器件Schr？dinger-Poisson系统多尺度分析与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

单链saRNA加工和抑制效率的研究

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

具有状态约束的Navier-Stokes方程的最优控制问题

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员