Markov Games with Decoupled Dynamics: Price of Anarchy and Sample Complexity - 专知论文

会员服务 ·

0

马尔科夫 · 博弈 · 样本复杂度 · 解耦 · 代价 ·

2023 年 4 月 7 日

Markov Games with Decoupled Dynamics: Price of Anarchy and Sample Complexity

翻译：解耦动力学的马尔可夫博弈：无政府代价与样本复杂度

Runyu Zhang,Yuyang Zhang,Rohit Konda,Bryce Ferguson,Jason Marden,Na Li

This paper studies the finite-time horizon Markov games where the agents' dynamics are decoupled but the rewards can possibly be coupled across agents. The policy class is restricted to local policies where agents make decisions using their local state. We first introduce the notion of smooth Markov games which extends the smoothness argument for normal form games to our setting, and leverage the smoothness property to bound the price of anarchy of the Markov game. For a specific type of Markov game called the Markov potential game, we also develop a distributed learning algorithm, multi-agent soft policy iteration (MA-SPI), which provably converges to a Nash equilibrium. Sample complexity of the algorithm is also provided. Lastly, our results are validated using a dynamic covering game.

翻译：本文研究有限时域马尔可夫博弈，其中智能体的动力学相互解耦，但奖励可能在智能体之间耦合。策略类别限制为局部策略，即智能体依据其局部状态进行决策。我们首先引入光滑马尔可夫博弈的概念，将规范式博弈的光滑性论证推广至我们的设定，并利用光滑性约束马尔可夫博弈的无政府代价。针对一类称为马尔可夫势博弈的特定博弈类型，我们进一步开发了一种分布式学习算法——多智能体软策略迭代（MA-SPI），该算法可证明收敛至纳什均衡。同时给出了该算法的样本复杂度。最后，通过动态覆盖博弈验证了我们的结果。

0

相关内容

马尔科夫

【NeurIPS2022】持续强化学习中的解纠缠迁移

【NeurIPS2022】持续强化学习中的解纠缠迁移

专知会员服务

28+阅读 · 2022年10月3日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

专知会员服务

38+阅读 · 2019年12月17日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

多参数传热反问题的RBF-MLPG方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

MR阻尼器动态特性的物理机制、能量转化及设计理论

国家自然科学基金

0+阅读 · 2013年12月31日

不确定耦合PDE-ODE系统的自适应镇定

国家自然科学基金

0+阅读 · 2013年12月31日

网络环境下非线性时变随机系统的最优递推滤波研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于混合互补机理的多杂质质量交换网络集成研究

国家自然科学基金

0+阅读 · 2012年12月31日

Markov状态转换下的跳扩散风险理论的新模型与新算法

国家自然科学基金

1+阅读 · 2012年12月31日

一类四阶MEMS方程的解集结构与解的渐近性态

国家自然科学基金

0+阅读 · 2011年12月31日

“#32511;洲—#33618;漠”#23707;屿生态种群的扩散模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

玻色-爱因斯坦凝聚中集体激发的Landau阻尼和频移

国家自然科学基金

0+阅读 · 2008年12月31日

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Arxiv

0+阅读 · 2023年5月25日

Gaussian Processes with State-Dependent Noise for Stochastic Control

Arxiv

0+阅读 · 2023年5月25日

Solving Infinite-State Games via Acceleration

Arxiv

0+阅读 · 2023年5月25日

Markov Decision Process with an External Temporal Process

Arxiv

0+阅读 · 2023年5月25日

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Arxiv

0+阅读 · 2023年5月25日

The Computational Complexity of Multi-player Concave Games and Kakutani Fixed Points

Arxiv

0+阅读 · 2023年5月25日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月24日

Concurrent Constrained Optimization of Unknown Rewards for Multi-Robot Task Allocation

Arxiv

0+阅读 · 2023年5月24日

Removing Structured Noise with Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

Physics Constrained Motion Prediction with Uncertainty Quantification

Arxiv

0+阅读 · 2023年5月24日

VIP会员

文章信息

相关主题

样本复杂度

最新内容

综述 | Memory for Large Language Models：大模型记忆机制全景

综述 | Memory for Large Language Models：大模型记忆机制全景

专知会员服务

0+阅读 · 今天14:26

博士论文 | Riemannian Deep Learning：模块、网络与几何

博士论文 | Riemannian Deep Learning：模块、网络与几何

专知会员服务

0+阅读 · 今天14:13

《越野作战环境下路径规划的多准则整数规划模型》

《越野作战环境下路径规划的多准则整数规划模型》

专知会员服务

4+阅读 · 今天8:06

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

专知会员服务

3+阅读 · 今天8:00

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

专知会员服务

3+阅读 · 今天7:53

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

专知会员服务

6+阅读 · 今天7:49

《同步多无人机系统中的故障与通信》

《同步多无人机系统中的故障与通信》

专知会员服务

2+阅读 · 今天6:23

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

3+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

8+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

7+阅读 · 7月28日

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

8+阅读 · 7月28日

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

8+阅读 · 7月28日

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

9+阅读 · 7月28日

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

6+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

11+阅读 · 7月27日

相关VIP内容

【NeurIPS2022】持续强化学习中的解纠缠迁移

【NeurIPS2022】持续强化学习中的解纠缠迁移

专知会员服务

28+阅读 · 2022年10月3日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

专知会员服务

38+阅读 · 2019年12月17日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

博士论文 | Riemannian Deep Learning：模块、网络与几何

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

综述 | Memory for Large Language Models：大模型记忆机制全景

《越野作战环境下路径规划的多准则整数规划模型》

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Arxiv

0+阅读 · 2023年5月25日

Gaussian Processes with State-Dependent Noise for Stochastic Control

Arxiv

0+阅读 · 2023年5月25日

Solving Infinite-State Games via Acceleration

Arxiv

0+阅读 · 2023年5月25日

Markov Decision Process with an External Temporal Process

Arxiv

0+阅读 · 2023年5月25日

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Arxiv

0+阅读 · 2023年5月25日

The Computational Complexity of Multi-player Concave Games and Kakutani Fixed Points

Arxiv

0+阅读 · 2023年5月25日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月24日

Concurrent Constrained Optimization of Unknown Rewards for Multi-Robot Task Allocation

Arxiv

0+阅读 · 2023年5月24日

Removing Structured Noise with Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

Physics Constrained Motion Prediction with Uncertainty Quantification

Arxiv

0+阅读 · 2023年5月24日

相关基金

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

多参数传热反问题的RBF-MLPG方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

MR阻尼器动态特性的物理机制、能量转化及设计理论

国家自然科学基金

0+阅读 · 2013年12月31日

不确定耦合PDE-ODE系统的自适应镇定

国家自然科学基金

0+阅读 · 2013年12月31日

网络环境下非线性时变随机系统的最优递推滤波研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于混合互补机理的多杂质质量交换网络集成研究

国家自然科学基金

0+阅读 · 2012年12月31日

Markov状态转换下的跳扩散风险理论的新模型与新算法

国家自然科学基金

1+阅读 · 2012年12月31日

一类四阶MEMS方程的解集结构与解的渐近性态

国家自然科学基金

0+阅读 · 2011年12月31日

“#32511;洲—#33618;漠”#23707;屿生态种群的扩散模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

玻色-爱因斯坦凝聚中集体激发的Landau阻尼和频移

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员