Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization - 专知论文

会员服务 ·

0

线性的 · 特征函数 · 估计误差 · 线性组合 · CASE ·

2023 年 3 月 13 日

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

翻译：自然策略梯度在对数线性策略参数化下的线性收敛性

Carlo Alfano,Patrick Rebeschini

from arxiv, In the latest version we acknowledge concurrent work

We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-linear policy parametrizations in infinite-horizon discounted Markov decision processes. In the deterministic case, when the Q-value is known and can be approximated by a linear combination of a known feature function up to a bias error, we show that a geometrically-increasing step size yields a linear convergence rate towards an optimal policy. We then consider the sample-based case, when the best representation of the Q- value function among linear combinations of a known feature function is known up to an estimation error. In this setting, we show that the algorithm enjoys the same linear guarantees as in the deterministic case up to an error term that depends on the estimation error, the bias error, and the condition number of the feature covariance matrix. Our results build upon the general framework of policy mirror descent and extend previous findings for the softmax tabular parametrization to the log-linear policy class.

翻译：我们分析了在无限时域折扣马尔可夫决策过程中，采用对数线性策略参数化的未正则化自然策略梯度算法的收敛速率。在确定性情况下，当Q值已知且可通过已知特征函数的线性组合逼近至某一偏差误差时，我们证明几何递增步长能实现向最优策略的线性收敛。随后考虑基于样本的情形：当已知特征函数线性组合对Q值函数的最佳表示已知但存在估计误差时，我们证明该算法与确定性情况享有相同的线性保证，仅额外依赖于估计误差、偏差误差以及特征协方差矩阵条件数的误差项。我们的结果基于策略镜像下降的通用框架，并将先前关于softmax表格型参数化的结论扩展至对数线性策略类。

0

相关内容

线性的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

图积和多项式理论中的图结构与极值问题

国家自然科学基金

0+阅读 · 2015年12月31日

基于天然产物Drimenal的新型杀菌剂分子设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类投资组合优化问题的模型与算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

带约束和参数的多变量逼近的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

巴氏醋杆菌131-X539高耐酸机理与群体感应的研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

拋物奇异积分算子有界性及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

有界噪声激励下非线性系统的全局动力学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Arxiv

0+阅读 · 2023年5月4日

An Improved Normal Compliance Method for Dynamic Hyperelastic Problems with Energy Conservation Property

Arxiv

0+阅读 · 2023年5月4日

Tight One-Shot Analysis for Convex Splitting with Applications in Quantum Information Theory

Arxiv

0+阅读 · 2023年5月4日

IMAP: Intrinsically Motivated Adversarial Policy

Arxiv

0+阅读 · 2023年5月4日

Convergence for score-based generative modeling with polynomial complexity

Arxiv

0+阅读 · 2023年5月3日

Sublinear Algorithms and Lower Bounds for Estimating MST and TSP Cost in General Metrics

Arxiv

0+阅读 · 2023年5月3日

On the Convergence of SARSA with Linear Function Approximation

Arxiv

0+阅读 · 2023年5月3日

Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $β$-divergence

Arxiv

0+阅读 · 2023年5月3日

Avoiding discretization issues for nonlinear eigenvalue problems

Arxiv

0+阅读 · 2023年5月2日

Why Do Local Methods Solve Nonconvex Problems?

Arxiv

12+阅读 · 2021年3月24日

VIP会员

文章信息

相关主题

最新内容

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

0+阅读 · 14分钟前

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

0+阅读 · 16分钟前

《为战备赋能：从美国海军250年历史中汲取经验》2026年150页书籍

《为战备赋能：从美国海军250年历史中汲取经验》2026年150页书籍

专知会员服务

0+阅读 · 26分钟前

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

3+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

3+阅读 · 6月14日

俄乌战场地面机器人如何改写战争规则

俄乌战场地面机器人如何改写战争规则

专知会员服务

8+阅读 · 6月14日

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

专知会员服务

7+阅读 · 6月14日

《新空中力量概念：来自敏捷战斗运用的启示》2026最新50页报告

《新空中力量概念：来自敏捷战斗运用的启示》2026最新50页报告

专知会员服务

9+阅读 · 6月14日

《无人水面艇文献综述与结构设计》135页

《无人水面艇文献综述与结构设计》135页

专知会员服务

12+阅读 · 6月13日

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

专知会员服务

10+阅读 · 6月13日

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

专知会员服务

2+阅读 · 6月13日

智能体时间序列系统全景综述：架构、可靠性与研究前沿

智能体时间序列系统全景综述：架构、可靠性与研究前沿

专知会员服务

11+阅读 · 6月13日

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

专知会员服务

10+阅读 · 6月12日

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

专知会员服务

28+阅读 · 6月12日

乌克兰战场背后的新武器

乌克兰战场背后的新武器

专知会员服务

8+阅读 · 6月12日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

《离线语言支持系统：面向空战战术决策》

《为战备赋能：从美国海军250年历史中汲取经验》2026年150页书籍

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Arxiv

0+阅读 · 2023年5月4日

An Improved Normal Compliance Method for Dynamic Hyperelastic Problems with Energy Conservation Property

Arxiv

0+阅读 · 2023年5月4日

Tight One-Shot Analysis for Convex Splitting with Applications in Quantum Information Theory

Arxiv

0+阅读 · 2023年5月4日

IMAP: Intrinsically Motivated Adversarial Policy

Arxiv

0+阅读 · 2023年5月4日

Convergence for score-based generative modeling with polynomial complexity

Arxiv

0+阅读 · 2023年5月3日

Sublinear Algorithms and Lower Bounds for Estimating MST and TSP Cost in General Metrics

Arxiv

0+阅读 · 2023年5月3日

On the Convergence of SARSA with Linear Function Approximation

Arxiv

0+阅读 · 2023年5月3日

Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $β$-divergence

Arxiv

0+阅读 · 2023年5月3日

Avoiding discretization issues for nonlinear eigenvalue problems

Arxiv

0+阅读 · 2023年5月2日

Why Do Local Methods Solve Nonconvex Problems?

Arxiv

12+阅读 · 2021年3月24日

相关基金

图积和多项式理论中的图结构与极值问题

国家自然科学基金

0+阅读 · 2015年12月31日

基于天然产物Drimenal的新型杀菌剂分子设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类投资组合优化问题的模型与算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

带约束和参数的多变量逼近的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

巴氏醋杆菌131-X539高耐酸机理与群体感应的研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

拋物奇异积分算子有界性及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

有界噪声激励下非线性系统的全局动力学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员