Convergence of Fast Policy Iteration in Markov Games and Robust MDPs - 专知论文

会员服务 ·

0

博弈 · 鞍点 · FT · 算法 · 鲁棒 ·

2025 年 11 月 16 日

Convergence of Fast Policy Iteration in Markov Games and Robust MDPs

翻译：马尔可夫博弈与鲁棒MDP中快速策略迭代的收敛性

Keith Badger,Jefferson Huang,Marek Petrik

Markov games and robust MDPs are closely related models that involve computing a pair of saddle point policies. As part of the long-standing effort to develop efficient algorithms for these models, the Filar-Tolwinski (FT) algorithm has shown considerable promise. As our first contribution, we demonstrate that FT may fail to converge to a saddle point and may loop indefinitely, even in small games. This observation contradicts the proof of FT's convergence to a saddle point in the original paper. As our second contribution, we propose Residual Conditioned Policy Iteration (RCPI). RCPI builds on FT, but is guaranteed to converge to a saddle point. Our numerical results show that RCPI outperforms other convergent algorithms by several orders of magnitude.

翻译：马尔可夫博弈与鲁棒马尔可夫决策过程（MDP）是密切相关的模型，其核心在于计算一对鞍点策略。作为长期致力于为这些模型开发高效算法的一部分，Filar-Tolwinski（FT）算法已展现出显著潜力。作为我们的第一项贡献，我们证明FT算法可能无法收敛至鞍点，甚至可能在小型博弈中无限循环。这一观察结果与原始论文中FT收敛至鞍点的证明相矛盾。作为第二项贡献，我们提出了残差条件策略迭代（RCPI）。RCPI基于FT算法构建，但能保证收敛至鞍点。数值实验表明，RCPI的性能优于其他收敛算法数个数量级。

0

相关内容

UnHiPPO：面向不确定性的状态空间模型初始化方法

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 2025年6月6日

【WWW2025】基于不确定性的图结构学习

【WWW2025】基于不确定性的图结构学习

专知会员服务

17+阅读 · 2025年2月20日

【NeurIPS2024】几何轨迹扩散模型

【NeurIPS2024】几何轨迹扩散模型

专知会员服务

24+阅读 · 2024年10月20日

【ICML2024】变分薛定谔扩散模型

【ICML2024】变分薛定谔扩散模型

专知会员服务

20+阅读 · 2024年5月11日

【WWW2024】博弈论式反事实解释图神经网络

【WWW2024】博弈论式反事实解释图神经网络

专知会员服务

32+阅读 · 2024年2月17日

【NeurIPS2022】黎曼扩散模型

【NeurIPS2022】黎曼扩散模型

专知会员服务

43+阅读 · 2022年9月15日

【NeurIPS2021】序一致因果图的多任务学习

【NeurIPS2021】序一致因果图的多任务学习

专知会员服务

20+阅读 · 2021年11月7日

【ICML2021】数据表示的几何评估

专知会员服务

38+阅读 · 2021年6月3日

AAAI 2021 | 稀疏胜负多智能体博弈中的纳什均衡解计算

专知会员服务

41+阅读 · 2021年2月12日

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

专知会员服务

49+阅读 · 2020年9月28日

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

20+阅读 · 2021年3月28日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

【KDD2020】XGNN-可解释图神经网络，从模型级解释构建可信赖GNN

【KDD2020】XGNN-可解释图神经网络，从模型级解释构建可信赖GNN

专知

17+阅读 · 2020年6月7日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

布尔可满足性算法和单调布尔函数的复杂性

国家自然科学基金

0+阅读 · 2015年12月31日

量子齐次空间上同调的非交换Hodge分解及形变意义

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

多组分格子波尔兹曼方法的数值分析

国家自然科学基金

0+阅读 · 2014年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于quantaloid-加载范畴的quantale值收敛理论

国家自然科学基金

1+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

176+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

111+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

A survey and taxonomy of loss functions in machine learning

Arxiv

28+阅读 · 2023年1月13日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

最新内容

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

专知会员服务

1+阅读 · 6月4日

AgentOps综述：智能体系统运维框架

AgentOps综述：智能体系统运维框架

专知会员服务

1+阅读 · 6月4日

《美陆军最新条令：兵力防护》

《美陆军最新条令：兵力防护》

专知会员服务

4+阅读 · 6月4日

《军用物联网：架构、应用、挑战与现代战争中的战略意义》

《军用物联网：架构、应用、挑战与现代战争中的战略意义》

专知会员服务

3+阅读 · 6月4日

《人工智能的挑战：算法战的想象与现实》

《人工智能的挑战：算法战的想象与现实》

专知会员服务

4+阅读 · 6月4日

《自适应智能：融合数字孪生精准性与人工智能预见力，实现实时决策》

《自适应智能：融合数字孪生精准性与人工智能预见力，实现实时决策》

专知会员服务

5+阅读 · 6月4日

首场人工智能战争：Maven如何重塑武装冲突

首场人工智能战争：Maven如何重塑武装冲突

专知会员服务

4+阅读 · 6月4日

【博士论文】抽象信息论与安全奖励学习的数学发展

【博士论文】抽象信息论与安全奖励学习的数学发展

专知会员服务

7+阅读 · 6月3日

综述 | 机器人操作世界模型：预测、行动接口与学习生命周期

综述 | 机器人操作世界模型：预测、行动接口与学习生命周期

专知会员服务

5+阅读 · 6月3日

《推进军事决策支持：运用强化学习驱动仿真的稳健作战计划验证》

《推进军事决策支持：运用强化学习驱动仿真的稳健作战计划验证》

专知会员服务

10+阅读 · 6月3日

详解人工智能赋能战争的旗舰软件平台：Maven智能系统

详解人工智能赋能战争的旗舰软件平台：Maven智能系统

专知会员服务

19+阅读 · 6月3日

《发展用于决策支持的化生放核（CBRN）态势理解》

《发展用于决策支持的化生放核（CBRN）态势理解》

专知会员服务

8+阅读 · 6月3日

《通往人工通用智能之路上的均衡策略》

《通往人工通用智能之路上的均衡策略》

专知会员服务

7+阅读 · 6月3日

《人工智能与军事整合：现状与未来风险》报告

《人工智能与军事整合：现状与未来风险》报告

专知会员服务

5+阅读 · 6月3日

《Palantir的科技生态系统》

《Palantir的科技生态系统》

专知会员服务

17+阅读 · 6月2日

相关VIP内容

UnHiPPO：面向不确定性的状态空间模型初始化方法

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 2025年6月6日

【WWW2025】基于不确定性的图结构学习

【WWW2025】基于不确定性的图结构学习

专知会员服务

17+阅读 · 2025年2月20日

【NeurIPS2024】几何轨迹扩散模型

【NeurIPS2024】几何轨迹扩散模型

专知会员服务

24+阅读 · 2024年10月20日

【ICML2024】变分薛定谔扩散模型

【ICML2024】变分薛定谔扩散模型

专知会员服务

20+阅读 · 2024年5月11日

【WWW2024】博弈论式反事实解释图神经网络

【WWW2024】博弈论式反事实解释图神经网络

专知会员服务

32+阅读 · 2024年2月17日

【NeurIPS2022】黎曼扩散模型

【NeurIPS2022】黎曼扩散模型

专知会员服务

43+阅读 · 2022年9月15日

【NeurIPS2021】序一致因果图的多任务学习

【NeurIPS2021】序一致因果图的多任务学习

专知会员服务

20+阅读 · 2021年11月7日

【ICML2021】数据表示的几何评估

专知会员服务

38+阅读 · 2021年6月3日

AAAI 2021 | 稀疏胜负多智能体博弈中的纳什均衡解计算

专知会员服务

41+阅读 · 2021年2月12日

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

专知会员服务

49+阅读 · 2020年9月28日

热门VIP内容

开通专知VIP会员享更多权益服务

AgentOps综述：智能体系统运维框架

《军用物联网：架构、应用、挑战与现代战争中的战略意义》

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

《美陆军最新条令：兵力防护》

相关资讯

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

20+阅读 · 2021年3月28日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

【KDD2020】XGNN-可解释图神经网络，从模型级解释构建可信赖GNN

【KDD2020】XGNN-可解释图神经网络，从模型级解释构建可信赖GNN

专知

17+阅读 · 2020年6月7日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

176+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

111+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

A survey and taxonomy of loss functions in machine learning

Arxiv

28+阅读 · 2023年1月13日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

相关基金

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

布尔可满足性算法和单调布尔函数的复杂性

国家自然科学基金

0+阅读 · 2015年12月31日

量子齐次空间上同调的非交换Hodge分解及形变意义

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

多组分格子波尔兹曼方法的数值分析

国家自然科学基金

0+阅读 · 2014年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于quantaloid-加载范畴的quantale值收敛理论

国家自然科学基金

1+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员