This paper proposes a new framework to study multi-agent interactions in Markov games: Markov $\alpha$-potential game. A game is called Markov $\alpha$-potential game if there exists a Markov potential game such that the pairwise difference between the change of a player's value function under a unilateral policy deviation in the Markov game and Markov potential game can be bounded by $\alpha$. As a special case, Markov potential games are Markov $\alpha$-potential games with $\alpha=0$. The dependence of $\alpha$ on the game parameters is also explicitly characterized in two classes of games that are practically-relevant: Markov congestion games and the perturbed Markov team games. For general Markov games, an optimization-based approach is introduced which can compute a Markov potential game which is closest to the given game in terms of $\alpha$. This approach can also be used to verify whether a game is a Markov potential game, and provide a candidate potential function. Two algorithms -- the projected gradient-ascent algorithm and the {sequential maximum one-stage improvement} -- are provided to approximate the stationary Nash equilibrium in Markov $\alpha$-potential games and the corresponding Nash-regret analysis is presented. The numerical experiments demonstrate that simple algorithms are capable of finding approximate equilibrium in Markov $\alpha$-potential games.
翻译:本文提出了一种研究马尔可夫博弈中多智能体交互的新框架:马尔可夫$α$-势博弈。若存在一个马尔可夫势博弈,使得在马尔可夫博弈与马尔可夫势博弈中,由于单方面策略偏离导致的玩家价值函数变化之差的成对差值能被$α$界定,则该博弈称为马尔可夫$α$-势博弈。作为特例,马尔可夫势博弈是$α=0$时的马尔可夫$α$-势博弈。对于两类具有实际相关性的博弈——马尔可夫拥塞博弈与受扰马尔可夫团队博弈,本文还明确刻画了$α$对博弈参数的依赖关系。针对一般马尔可夫博弈,引入了一种基于优化的方法,可计算与给定博弈在$α$意义上最接近的马尔可夫势博弈。该方法还可用于验证博弈是否为马尔可夫势博弈,并提供候选势函数。本文提出了两种算法——投影梯度上升算法与{顺序最大单阶段改进}——以逼近马尔可夫$α$-势博弈中的平稳纳什均衡,并给出了相应的纳什遗憾分析。数值实验表明,简单算法能够有效找到马尔可夫$α$-势博弈中的近似均衡。