We study a new class of Markov games (MGs), \textit{Multi-player Zero-sum Markov Games} with {\it Networked separable interactions} (MZNMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define an MZNMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as an MZNMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the {product of} per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted MZNMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology''. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for MZNMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results.
翻译:我们研究了一类新的马尔可夫博弈(MGs),即具有网络化可分离交互的多玩家零和马尔可夫博弈(MZNMGs),以建模非合作多智能体序贯决策中的局部交互结构。我们将MZNMG定义为一种模型,其中每个状态关联的辅助博弈的收益是零和的,并且在某个交互网络上具有跨邻居的可分离(即多矩阵)结构。我们首先识别了博弈可表示为MZNMG的充要条件,并证明在这类博弈中,马尔可夫粗相关均衡(CCE)的集合退化为马尔可夫纳什均衡(NE)的集合,即前者对所有玩家的每个状态边际化乘积得到后者。进一步地,我们证明在无限时域折扣MZNMGs中寻找近似马尔可夫平稳CCE是\texttt{PPAD}困难的,除非底层网络具有"星型拓扑"。然后,我们将正则形式博弈中的经典学习动力学——虚构博弈类型动力学——推广至MZNMGs,并建立其在星形网络结构下收敛到马尔可夫平稳纳什均衡的保证。最后,鉴于难度结果,我们聚焦于计算马尔可夫非平稳纳什均衡,并为一系列基于值迭代的算法提供了有限迭代保证。我们还通过数值实验验证了理论结果。