Policy Gradient in Robust MDPs with Global Convergence Guarantee - 专知论文

会员服务 ·

0

稳健性 · 可约的 · 转移核 · Markov · 全局优化 ·

2023 年 6 月 7 日

Policy Gradient in Robust MDPs with Global Convergence Guarantee

翻译：鲁棒马尔可夫决策过程中的策略梯度方法与全局收敛性保证

Qiuhao Wang,Chin Pang Ho,Marek Petrik

Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.

翻译：鲁棒马尔可夫决策过程（RMDPs）为应对模型误差下计算可靠策略提供了有前景的框架。许多成功的强化学习算法基于策略梯度方法的变体，但将这些方法适配到RMDPs颇具挑战性。因此，RMDPs在大型实际应用场景中的适用性仍受到限制。本文提出了一种新的双循环鲁棒策略梯度（DRPG）算法——首个专为RMDPs设计的通用策略梯度方法。与先前的鲁棒策略梯度算法不同，DRPG通过单调减小近似误差，确保在表格型RMDPs中收敛到全局最优策略。我们引入了一种新颖的参数化转移核，并采用基于梯度的方法求解内部循环的鲁棒策略。最后，数值实验结果验证了新算法的实用性，并证实了其全局收敛特性。

0

相关内容

稳健性

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

草菇CRISPR/Cas9基因功能鉴定技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Ghrelin对牛卵母细胞体外成熟的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

前体mRNA剪切因子SF1磷酸化的结构与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

哺乳动物体细胞克隆胚胎抗氧化机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

前列腺增生雄激素通路hub基因(SOX9-AR-NFKB1)的miRNA调控网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

用于生物质衍生物“#27700;相重整”#30340;金属/多孔金属氧化物双功能催化剂的设计与制备

国家自然科学基金

0+阅读 · 2009年12月31日

Error analysis of energy-conservative BDF2-FE scheme for the 2D Navier-Stokes equations with variable density

Arxiv

0+阅读 · 2023年7月28日

The Topology of Causality

Arxiv

0+阅读 · 2023年7月27日

Convergence of the numerical approximations and well-posedness: Nonlocal conservation laws with rough flux

Arxiv

0+阅读 · 2023年7月27日

Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年7月27日

Client Selection in Federated Learning: Principles, Challenges, and Opportunities

Arxiv

0+阅读 · 2023年7月26日

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Arxiv

0+阅读 · 2023年7月26日

Source Condition Double Robust Inference on Functionals of Inverse Problems

Arxiv

0+阅读 · 2023年7月25日

Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor

Arxiv

0+阅读 · 2023年7月25日

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年7月25日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

14+阅读 · 2021年12月20日

VIP会员

文章信息

相关主题

最新内容

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

0+阅读 · 今天15:20

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

0+阅读 · 今天15:18

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

8+阅读 · 今天5:53

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

4+阅读 · 今天5:45

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

2+阅读 · 今天5:23

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

2+阅读 · 今天5:11

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

6+阅读 · 今天5:04

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

4+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

8+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

10+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

美空军新型反无人机部队初探

博士论文 | 面向大模型推理的内存高效算法

《无人系统互操作性导论——无人系统联合架构（JAUS）》

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Error analysis of energy-conservative BDF2-FE scheme for the 2D Navier-Stokes equations with variable density

Arxiv

0+阅读 · 2023年7月28日

The Topology of Causality

Arxiv

0+阅读 · 2023年7月27日

Convergence of the numerical approximations and well-posedness: Nonlocal conservation laws with rough flux

Arxiv

0+阅读 · 2023年7月27日

Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年7月27日

Client Selection in Federated Learning: Principles, Challenges, and Opportunities

Arxiv

0+阅读 · 2023年7月26日

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Arxiv

0+阅读 · 2023年7月26日

Source Condition Double Robust Inference on Functionals of Inverse Problems

Arxiv

0+阅读 · 2023年7月25日

Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor

Arxiv

0+阅读 · 2023年7月25日

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年7月25日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

14+阅读 · 2021年12月20日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

草菇CRISPR/Cas9基因功能鉴定技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Ghrelin对牛卵母细胞体外成熟的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

前体mRNA剪切因子SF1磷酸化的结构与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

哺乳动物体细胞克隆胚胎抗氧化机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

前列腺增生雄激素通路hub基因(SOX9-AR-NFKB1)的miRNA调控网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

用于生物质衍生物“#27700;相重整”#30340;金属/多孔金属氧化物双功能催化剂的设计与制备

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员