This paper studies the finite-time horizon Markov games where the agents' dynamics are decoupled but the rewards can possibly be coupled across agents. The policy class is restricted to local policies where agents make decisions using their local state. We first introduce the notion of smooth Markov games which extends the smoothness argument for normal form games to our setting, and leverage the smoothness property to bound the price of anarchy of the Markov game. For a specific type of Markov game called the Markov potential game, we also develop a distributed learning algorithm, multi-agent soft policy iteration (MA-SPI), which provably converges to a Nash equilibrium. Sample complexity of the algorithm is also provided. Lastly, our results are validated using a dynamic covering game.
翻译:本文研究有限时域马尔可夫博弈,其中智能体的动力学相互解耦,但奖励可能在智能体之间耦合。策略类别限制为局部策略,即智能体依据其局部状态进行决策。我们首先引入光滑马尔可夫博弈的概念,将规范式博弈的光滑性论证推广至我们的设定,并利用光滑性约束马尔可夫博弈的无政府代价。针对一类称为马尔可夫势博弈的特定博弈类型,我们进一步开发了一种分布式学习算法——多智能体软策略迭代(MA-SPI),该算法可证明收敛至纳什均衡。同时给出了该算法的样本复杂度。最后,通过动态覆盖博弈验证了我们的结果。