Can We Find Nash Equilibria at a Linear Rate in Markov Games? - 专知论文

会员服务 ·

0

Markov · 线性的 · 基 · ENJOY · 纳什均衡 ·

2023 年 3 月 3 日

Can We Find Nash Equilibria at a Linear Rate in Markov Games?

翻译：马尔可夫博弈中能否以线性速率找到纳什均衡？

Zhuoqing Song,Jason D. Lee,Zhuoran Yang

from arxiv, ICLR 2023

We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. First, the player does not need to know the policy of the opponent to update its policy. Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game. To this end, we construct a meta algorithm, dubbed as $\texttt{Homotopy-PO}$, which provably finds a Nash equilibrium at a global linear rate. In particular, $\texttt{Homotopy-PO}$ interweaves two base algorithms $\texttt{Local-Fast}$ and $\texttt{Global-Slow}$ via homotopy continuation. $\texttt{Local-Fast}$ is an algorithm that enjoys local linear convergence while $\texttt{Global-Slow}$ is an algorithm that converges globally but at a slower sublinear rate. By switching between these two base algorithms, $\texttt{Global-Slow}$ essentially serves as a ``guide'' which identifies a benign neighborhood where $\texttt{Local-Fast}$ enjoys fast convergence. However, since the exact size of such a neighborhood is unknown, we apply a doubling trick to switch between these two base algorithms. The switching scheme is delicately designed so that the aggregated performance of the algorithm is driven by $\texttt{Local-Fast}$. Furthermore, we prove that $\texttt{Local-Fast}$ and $\texttt{Global-Slow}$ can both be instantiated by variants of optimistic gradient descent/ascent (OGDA) method, which is of independent interest.

翻译：我们研究两人零和折扣马尔可夫博弈中的去中心化学习问题，目标是为任意一个智能体设计满足两个性质的政策优化算法。第一，智能体更新其政策时无需知道对手的政策。第二，当两个智能体都采用该算法时，它们的联合政策收敛到博弈的纳什均衡。为此，我们构建了一个元算法，称为$\texttt{Homotopy-PO}$，该算法能全局以线性速率找到纳什均衡。具体地，$\texttt{Homotopy-PO}$通过同伦延拓交织两个基础算法$\texttt{Local-Fast}$和$\texttt{Global-Slow}$。$\texttt{Local-Fast}$是享有局部线性收敛性的算法，而$\texttt{Global-Slow}$是全局收敛但速率较慢的次线性算法。通过在这两个基础算法间切换，$\texttt{Global-Slow}$本质上充当“向导”，识别出$\texttt{Local-Fast}$能快速收敛的良性邻域。然而，由于该邻域的确切大小未知，我们应用倍增技巧来切换这两个基础算法。切换方案精心设计，使得算法的整体性能由$\texttt{Local-Fast}$驱动。此外，我们证明了$\texttt{Local-Fast}$和$\texttt{Global-Slow}$均可由乐观梯度下降/上升（OGDA）方法的变体实例化，这具有独立的研究价值。

0

相关内容

Markov

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

PDCD4在2型糖尿病心肌病胰岛素抵抗中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

补肾益气通络化浊法经PKC通路治疗糖尿病肾病的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mumford-Shah型图像分割问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同代谢特征与2型糖尿病关系的队列研究

国家自然科学基金

0+阅读 · 2012年12月31日

桔梗皂苷D对2型糖尿病小鼠降血糖作用及肝脏糖异生的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

高阶非线性波动方程

国家自然科学基金

0+阅读 · 2011年12月31日

癌痛消方对大鼠肝癌模型细胞凋亡信号传导的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

甘草素（liquiritigenin）抗肝肿瘤作用及其氧化应激机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

地方政府债务可持续性与管理制度创新研究—#8212;以云南省为例

国家自然科学基金

0+阅读 · 2009年12月31日

不同株型作物氮素组分时空分布遥感监测机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

On the Order of Power Series and the Sum of Square Roots Problem

Arxiv

0+阅读 · 2023年4月26日

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Arxiv

0+阅读 · 2023年4月25日

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Arxiv

0+阅读 · 2023年4月25日

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

Arxiv

0+阅读 · 2023年4月25日

Q-based Equilibria

Arxiv

0+阅读 · 2023年4月25日

Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems?

Arxiv

0+阅读 · 2023年4月24日

Computing the optimal error exponential function for fixed-length lossy coding in discrete memoryless sources

Arxiv

0+阅读 · 2023年4月23日

Accelerating Evolution Through Gene Masking and Distributed Search

Arxiv

0+阅读 · 2023年4月23日

Base Fee Manipulation In Ethereum's EIP-1559 Transaction Fee Mechanism

Arxiv

0+阅读 · 2023年4月22日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

0+阅读 · 3分钟前

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

0+阅读 · 5分钟前

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

0+阅读 · 7分钟前

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

2+阅读 · 30分钟前

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

2+阅读 · 今天13:50

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

2+阅读 · 今天13:33

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

2+阅读 · 今天13:30

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

2+阅读 · 今天13:28

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

2+阅读 · 今天13:13

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

1+阅读 · 今天13:10

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

5+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

7+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

5+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

5+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

15+阅读 · 6月16日

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

定向能反无人机系统最新发展动态

《短程弹道再入飞行器拦截时间中的一项异常现象》

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

从燃煤战舰到算法战争：水面指挥的永恒要求

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

On the Order of Power Series and the Sum of Square Roots Problem

Arxiv

0+阅读 · 2023年4月26日

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Arxiv

0+阅读 · 2023年4月25日

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Arxiv

0+阅读 · 2023年4月25日

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

Arxiv

0+阅读 · 2023年4月25日

Q-based Equilibria

Arxiv

0+阅读 · 2023年4月25日

Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems?

Arxiv

0+阅读 · 2023年4月24日

Computing the optimal error exponential function for fixed-length lossy coding in discrete memoryless sources

Arxiv

0+阅读 · 2023年4月23日

Accelerating Evolution Through Gene Masking and Distributed Search

Arxiv

0+阅读 · 2023年4月23日

Base Fee Manipulation In Ethereum's EIP-1559 Transaction Fee Mechanism

Arxiv

0+阅读 · 2023年4月22日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

相关基金

PDCD4在2型糖尿病心肌病胰岛素抵抗中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

补肾益气通络化浊法经PKC通路治疗糖尿病肾病的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mumford-Shah型图像分割问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同代谢特征与2型糖尿病关系的队列研究

国家自然科学基金

0+阅读 · 2012年12月31日

桔梗皂苷D对2型糖尿病小鼠降血糖作用及肝脏糖异生的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

高阶非线性波动方程

国家自然科学基金

0+阅读 · 2011年12月31日

癌痛消方对大鼠肝癌模型细胞凋亡信号传导的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

甘草素（liquiritigenin）抗肝肿瘤作用及其氧化应激机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

地方政府债务可持续性与管理制度创新研究—#8212;以云南省为例

国家自然科学基金

0+阅读 · 2009年12月31日

不同株型作物氮素组分时空分布遥感监测机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员