Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes - 专知论文

会员服务 ·

0

PMD · Markov · 确切的 · Processing（编程语言） · 优化器 ·

2023 年 5 月 30 日

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

翻译：精确折扣马尔可夫决策过程中策略镜像下降的最优收敛速率

Emmeran Johnson,Ciara Pike-Burke,Patrick Rebeschini

from arxiv, 33 pages, 1 figure. New result (Theorem 3) on necessity of adaptive step-size

Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, unregularised PMD algorithmically regularises the policy improvement step of PI without regularising the objective function. With exact policy evaluation, PI is known to converge linearly with a rate given by the discount factor $\gamma$ of a Markov Decision Process. In this work, we bridge the gap between PI and PMD with exact policy evaluation and show that the dimension-free $\gamma$-rate of PI can be achieved by the general family of unregularised PMD algorithms under an adaptive step-size. We show that both the rate and step-size are unimprovable for PMD: we provide matching lower bounds that demonstrate that the $\gamma$-rate is optimal for PMD methods as well as PI and that the adaptive step-size is necessary to achieve it. Our work is the first to relate PMD to rate-optimality and step-size necessity. Our study of the convergence of PMD avoids the use of the performance difference lemma, which leads to a direct analysis of independent interest. We also extend the analysis to the inexact setting and establish the first dimension-optimal sample complexity for unregularised PMD under a generative model, improving upon the best-known result.

翻译：策略镜像下降（PMD）是一类通用算法族，涵盖了强化学习中多种新颖且基础的方法。受非精确策略评估下策略迭代（PI）不稳定性问题的启发，未正则化PMD在算法层面隐式地对PI的策略改进步骤进行正则化，而不直接正则化目标函数。已知在精确策略评估条件下，PI以马尔可夫决策过程折扣因子γ决定的速率线性收敛。本文弥合了精确策略评估下PI与PMD之间的理论鸿沟，证明在自适应步长策略下，未正则化PMD算法族能够实现PI中与维度无关的γ-收敛速率。我们进一步证明该速率与步长策略对PMD而言均不可改进：通过建立匹配的下界，证实γ-速率对PMD方法与PI同样最优，且自适应步长是实现该速率的必要条件。本研究首次将PMD与速率最优性及步长必要性建立关联。我们避免使用性能差异引理来分析PMD收敛性，这种直接分析方法具有独立研究价值。此外，我们将分析扩展至非精确设置，在生成模型下建立了未正则化PMD的首个维度最优样本复杂度，改进了现有最优结果。

0

相关内容

PMD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

mTOR-STAT3-Notch信号通路介导的自噬在ALDH2改善阿尔茨海默病认知障碍中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

GIT1CC2结构域在保护脊髓缺血再灌注损伤（SCII）中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

功能化石墨烯量子点合成与荧光传感

国家自然科学基金

0+阅读 · 2012年12月31日

15-kDa硒蛋白在内质网应激（ERS）和阿尔茨海默病(AD)中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

TM4SF1调控Collagen/DDR1信号通路促进乳腺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

肾缺血再灌注损伤的MRI磁敏感成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

乳腺癌化疗所致记忆障碍的脑机制及其康复的研究

国家自然科学基金

0+阅读 · 2011年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

强耦合等离子体X光光谱实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

Adaptive FEM with quasi-optimal overall cost for nonsymmetric linear elliptic PDEs

Arxiv

0+阅读 · 2023年7月19日

Soft Cap for Eversion Robots

Arxiv

0+阅读 · 2023年7月19日

A simple and efficient convex optimization based bound-preserving high order accurate limiter for Cahn-Hilliard-Navier-Stokes system

Arxiv

0+阅读 · 2023年7月19日

Tight Distribution-Free Confidence Intervals for Local Quantile Regression

Arxiv

0+阅读 · 2023年7月17日

Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

Arxiv

0+阅读 · 2023年7月17日

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Arxiv

0+阅读 · 2023年7月17日

Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces

Arxiv

0+阅读 · 2023年7月15日

Unconstrained Online Learning with Unbounded Losses

Arxiv

0+阅读 · 2023年7月14日

A combination technique for optimal control problems constrained by random PDEs

Arxiv

0+阅读 · 2023年7月14日

Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback

Arxiv

0+阅读 · 2023年7月13日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

2+阅读 · 今天8:04

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

3+阅读 · 今天7:59

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

3+阅读 · 今天7:56

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

3+阅读 · 今天7:50

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

3+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

4+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

11+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

7+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

6+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

4+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

10+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

6+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

9+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

11+阅读 · 7月26日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《异构人类团队的协作决策过程混合建模研究》

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

面向国防作战的最佳自主与蜂群无人机技术

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Adaptive FEM with quasi-optimal overall cost for nonsymmetric linear elliptic PDEs

Arxiv

0+阅读 · 2023年7月19日

Soft Cap for Eversion Robots

Arxiv

0+阅读 · 2023年7月19日

A simple and efficient convex optimization based bound-preserving high order accurate limiter for Cahn-Hilliard-Navier-Stokes system

Arxiv

0+阅读 · 2023年7月19日

Tight Distribution-Free Confidence Intervals for Local Quantile Regression

Arxiv

0+阅读 · 2023年7月17日

Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

Arxiv

0+阅读 · 2023年7月17日

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Arxiv

0+阅读 · 2023年7月17日

Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces

Arxiv

0+阅读 · 2023年7月15日

Unconstrained Online Learning with Unbounded Losses

Arxiv

0+阅读 · 2023年7月14日

A combination technique for optimal control problems constrained by random PDEs

Arxiv

0+阅读 · 2023年7月14日

Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback

Arxiv

0+阅读 · 2023年7月13日

相关基金

mTOR-STAT3-Notch信号通路介导的自噬在ALDH2改善阿尔茨海默病认知障碍中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

GIT1CC2结构域在保护脊髓缺血再灌注损伤（SCII）中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

功能化石墨烯量子点合成与荧光传感

国家自然科学基金

0+阅读 · 2012年12月31日

15-kDa硒蛋白在内质网应激（ERS）和阿尔茨海默病(AD)中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

TM4SF1调控Collagen/DDR1信号通路促进乳腺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

肾缺血再灌注损伤的MRI磁敏感成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

乳腺癌化疗所致记忆障碍的脑机制及其康复的研究

国家自然科学基金

0+阅读 · 2011年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

强耦合等离子体X光光谱实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员