Policy Gradient Algorithms Implicitly Optimize by Continuation - 专知论文

会员服务 ·

0

优化器 · Continuity · 有向 · 泛函 · 总回报 ·

2023 年 7 月 25 日

Policy Gradient Algorithms Implicitly Optimize by Continuation

翻译：策略梯度算法通过延续隐式优化

Adrien Bolland,Gilles Louppe,Damien Ernst

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification of these algorithms. First, we formulate direct policy optimization in the optimization by continuation framework. The latter is a framework for optimizing nonconvex functions where a sequence of surrogate objective functions, called continuations, are locally optimized. Second, we show that optimizing affine Gaussian policies and performing entropy regularization can be interpreted as implicitly optimizing deterministic policies by continuation. Based on these theoretical results, we argue that exploration in policy-gradient algorithms consists in computing a continuation of the return of the policy at hand, and that the variance of policies should be history-dependent functions adapted to avoid local extrema rather than to maximize the return of the policy.

翻译：强化学习中的直接策略优化通常通过策略梯度算法解决，该类算法利用随机梯度上升优化策略参数。本文为这些算法提供了新的理论解释与验证。首先，我们在优化延续框架下形式化直接策略优化问题——该框架通过局部优化一系列称为“延续”的替代目标函数来处理非凸函数优化。其次，我们证明优化仿射高斯策略并执行熵正则化可被解释为通过延续隐式优化确定性策略。基于这些理论结果，我们认为策略梯度算法中的探索本质在于计算当前策略回报的延续，而策略的方差应作为依赖于历史状态的函数进行自适应调整，其目的是避免局部极值而非最大化策略回报。

0

相关内容

优化器

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Adaptive Priority Reweighing for Generalizing Fairness Improvement

Arxiv

0+阅读 · 2023年9月15日

Gaussian Process Surrogate Models for Neural Networks

Arxiv

0+阅读 · 2023年9月14日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Sequence Level Contrastive Learning for Text Summarization

Sequence Level Contrastive Learning for Text Summarization

Arxiv

14+阅读 · 2021年9月24日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Learning Discrete Structures for Graph Neural Networks

Arxiv

18+阅读 · 2019年3月28日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

15+阅读 · 2018年6月26日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

最新内容

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

1+阅读 · 7月25日

综述 | 遥感多模态大模型：领域专用还是通用模型？

综述 | 遥感多模态大模型：领域专用还是通用模型？

专知会员服务

1+阅读 · 7月25日

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

专知会员服务

4+阅读 · 7月25日

《决策模型比较研究》

《决策模型比较研究》

专知会员服务

9+阅读 · 7月25日

全球军事与武器工业中的人工智能：应用、方法与影响（万字长文）

全球军事与武器工业中的人工智能：应用、方法与影响（万字长文）

专知会员服务

5+阅读 · 7月25日

《美军水下战与海床战概述及本地实施》

《美军水下战与海床战概述及本地实施》

专知会员服务

6+阅读 · 7月25日

面向未来冲突推进陆军情报体制改革

面向未来冲突推进陆军情报体制改革

专知会员服务

4+阅读 · 7月25日

人工智能赋能无人机：俄乌冲突案例及其深远影响（万字长文）

人工智能赋能无人机：俄乌冲突案例及其深远影响（万字长文）

专知会员服务

5+阅读 · 7月25日

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

专知会员服务

11+阅读 · 7月24日

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

专知会员服务

8+阅读 · 7月24日

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

专知会员服务

5+阅读 · 7月24日

乌克兰纵深打击如何重塑俄罗斯的战略选择

乌克兰纵深打击如何重塑俄罗斯的战略选择

专知会员服务

3+阅读 · 7月24日

《分布式太空任务对比分析与综合建模及仿真环境》120页

《分布式太空任务对比分析与综合建模及仿真环境》120页

专知会员服务

4+阅读 · 7月24日

俄乌战争中关于中程打击无人机部署的经验启示

俄乌战争中关于中程打击无人机部署的经验启示

专知会员服务

5+阅读 · 7月24日

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

7+阅读 · 7月23日

相关VIP内容

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 遥感多模态大模型：领域专用还是通用模型？

《决策模型比较研究》

博士论文 | 用代码结构感知方法推进代码大模型

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

相关论文

Adaptive Priority Reweighing for Generalizing Fairness Improvement

Arxiv

0+阅读 · 2023年9月15日

Gaussian Process Surrogate Models for Neural Networks

Arxiv

0+阅读 · 2023年9月14日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Sequence Level Contrastive Learning for Text Summarization

Sequence Level Contrastive Learning for Text Summarization

Arxiv

14+阅读 · 2021年9月24日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Learning Discrete Structures for Graph Neural Networks

Arxiv

18+阅读 · 2019年3月28日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

15+阅读 · 2018年6月26日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员