Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning - 专知论文

会员服务 ·

0

Learning · Extensibility · 强化学习 · 回合 · CC ·

2022 年 9 月 19 日

Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning

翻译：评分高级访问访问考察强化学习的失分情况

Mingqi Yuan,Bo Li,Xin Jin,Wenjun Zeng

from arxiv, 15 pages, 8 figures

Exploration is critical for deep reinforcement learning in complex environments with high-dimensional observations and sparse rewards. To address this problem, recent approaches proposed to leverage intrinsic rewards to improve exploration, such as novelty-based exploration and prediction-based exploration. However, many intrinsic reward modules require sophisticated structures and representation learning, resulting in prohibitive computational complexity and unstable performance. In this paper, we propose Rewarding Episodic Visitation Discrepancy (REVD), a computation-efficient and quantified exploration method. More specifically, REVD provides intrinsic rewards by evaluating the R\'enyi divergence-based visitation discrepancy between episodes. To make efficient divergence estimation, a k-nearest neighbor estimator is utilized with a randomly-initialized state encoder. Finally, the REVD is tested on PyBullet Robotics Environments and Atari games. Extensive experiments demonstrate that REVD can significantly improves the sample efficiency of reinforcement learning algorithms and outperforms the benchmarking methods.

翻译：为了解决这一问题,最近提议了一些办法,以利用内在奖励来改进勘探,例如以新颖的勘探和预测为基础的勘探。然而,许多内在奖励模块需要复杂的结构和代表性学习,从而导致令人望而却步的计算复杂性和不稳定的性能。在本论文中,我们提议奖励访问差异(REVD),这是一种具有计算效率和量化的探索方法。更具体地说,REVD通过评价R\'enyi基于差异的访问差异差异来提供内在奖励。为了作出有效的差异估计,使用K-最接近的邻居估计器和一个随机的初始状态编码器。最后,REVD是在PyBullet机器人环境与Atari游戏上测试的。广泛的实验表明,REVD可以大大提高强化学习算法的抽样效率,并超越基准方法。

0

相关内容

Learning

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于特征骨架质谱定位法快速发现海绵中aaptamine生物碱类抗肿瘤先导化合物

国家自然科学基金

0+阅读 · 2015年12月31日

Al-Cr-Si系中十次准晶体原位三维晶体结构的电子断层成像三维重构

国家自然科学基金

0+阅读 · 2014年12月31日

c-Myc-GPC5通路调控前列腺癌进展的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Pygo2在TGF-β信号刺激的乳腺癌上皮-间质转化（EMT）形成中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

从肿瘤相关炎症角度探讨STAT5和COX-2在结直肠癌中的作用机制及相互关系

国家自然科学基金

0+阅读 · 2011年12月31日

复杂网络系统的有限时间同步控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

城市供水管网突发性事故应急处理/管理系统

国家自然科学基金

0+阅读 · 2009年12月31日

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Arxiv

0+阅读 · 2022年10月26日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2022年10月26日

In-context Reinforcement Learning with Algorithm Distillation

Arxiv

0+阅读 · 2022年10月25日

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Arxiv

1+阅读 · 2022年10月24日

A Cooperative Reinforcement Learning Environment for Detecting and Penalizing Betrayal

Arxiv

0+阅读 · 2022年10月23日

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Arxiv

0+阅读 · 2022年10月22日

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月21日

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月20日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

VIP会员

文章信息

相关主题

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

3+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

2+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

5+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

5+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

6+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

6+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

7+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

7+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

7+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

6+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

9+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

10+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

9+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

9+阅读 · 6月24日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Arxiv

0+阅读 · 2022年10月26日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2022年10月26日

In-context Reinforcement Learning with Algorithm Distillation

Arxiv

0+阅读 · 2022年10月25日

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Arxiv

1+阅读 · 2022年10月24日

A Cooperative Reinforcement Learning Environment for Detecting and Penalizing Betrayal

Arxiv

0+阅读 · 2022年10月23日

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Arxiv

0+阅读 · 2022年10月22日

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月21日

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月20日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

相关基金

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于特征骨架质谱定位法快速发现海绵中aaptamine生物碱类抗肿瘤先导化合物

国家自然科学基金

0+阅读 · 2015年12月31日

Al-Cr-Si系中十次准晶体原位三维晶体结构的电子断层成像三维重构

国家自然科学基金

0+阅读 · 2014年12月31日

c-Myc-GPC5通路调控前列腺癌进展的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Pygo2在TGF-β信号刺激的乳腺癌上皮-间质转化（EMT）形成中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

从肿瘤相关炎症角度探讨STAT5和COX-2在结直肠癌中的作用机制及相互关系

国家自然科学基金

0+阅读 · 2011年12月31日

复杂网络系统的有限时间同步控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

城市供水管网突发性事故应急处理/管理系统

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员