Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning - 专知论文

会员服务 ·

0

价值函数 · 估计/估计量 · 泛函 · Learning · 样本 ·

2023 年 2 月 2 日

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

翻译：利用深度强化学习中先验计算估计值函数加速策略梯度

Md Masudur Rahman,Yexiang Xue

This paper investigates the use of prior computation to estimate the value function to improve sample efficiency in on-policy policy gradient methods in reinforcement learning. Our approach is to estimate the value function from prior computations, such as from the Q-network learned in DQN or the value function trained for different but related environments. In particular, we learn a new value function for the target task while combining it with a value estimate from the prior computation. Finally, the resulting value function is used as a baseline in the policy gradient method. This use of a baseline has the theoretical property of reducing variance in gradient computation and thus improving sample efficiency. The experiments show the successful use of prior value estimates in various settings and improved sample efficiency in several tasks.

翻译：本文研究在强化学习的同策略策略梯度方法中，利用先验计算估计值函数以提高样本效率。我们的方法是从先验计算中估计值函数，例如从DQN中学习的Q网络或为不同但相关环境训练的值函数中获取。具体而言，我们为目标任务学习新的值函数，同时将其与先验计算得到的值估计相结合。最终的值函数被用作策略梯度方法中的基线。这种基线的使用具有降低梯度计算方差的特性，从而提升样本效率。实验表明，在各种设置下成功利用先验值估计，并在多个任务中提高了样本效率。

0

相关内容

价值函数

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

99+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于针刺对脑出血神经重塑作用的内质网应激与自噬稳态调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

符号数值混合计算中基于问题结构的算法和数值分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

双靶点抑制c-met和VEGFR2治疗高侵袭性肝细胞癌及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAF1在心肌梗死后心室重构中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

Renin-Angiotensin System在介导机械通气所致肺微血管内皮细胞功能障碍中的作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蓝光激发高效硅酸盐红/绿荧光粉的制备及发光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

γ#27688;基丁酸通过肿瘤抗原TRAK1(MGb2-Ag)调控胃癌细胞生长的机制

国家自然科学基金

0+阅读 · 2009年12月31日

A Practical Survey on Faster and Lighter Transformers

Arxiv

0+阅读 · 2023年3月27日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2023年3月26日

Sequential Knockoffs for Variable Selection in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月24日

Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition

Arxiv

0+阅读 · 2023年3月24日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

17+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

估计/估计量

最新内容

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

4+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

3+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

5+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

10+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

14+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

99+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于强化学习的自动化红队测试》

“天降毒雾”：无人机如何使化学战重返乌克兰战场

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Practical Survey on Faster and Lighter Transformers

Arxiv

0+阅读 · 2023年3月27日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2023年3月26日

Sequential Knockoffs for Variable Selection in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月24日

Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition

Arxiv

0+阅读 · 2023年3月24日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

17+阅读 · 2018年6月27日

相关基金

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于针刺对脑出血神经重塑作用的内质网应激与自噬稳态调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

符号数值混合计算中基于问题结构的算法和数值分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

双靶点抑制c-met和VEGFR2治疗高侵袭性肝细胞癌及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAF1在心肌梗死后心室重构中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

Renin-Angiotensin System在介导机械通气所致肺微血管内皮细胞功能障碍中的作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蓝光激发高效硅酸盐红/绿荧光粉的制备及发光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

γ#27688;基丁酸通过肿瘤抗原TRAK1(MGb2-Ag)调控胃癌细胞生长的机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员