Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.
翻译:多智能体强化学习因其固有的非平稳特性及智能体间的强耦合性,成为一个具有挑战性的活跃研究领域。建模多智能体强化学习问题中智能体交互的一种主流方法是马尔可夫博弈。存在一类特殊的马尔可夫博弈,称为马尔可夫势博弈,它允许我们将马尔可夫博弈简化为一个单目标最优控制问题,其中目标函数为势函数。本工作中,我们证明了一个广泛存在于工程应用中的多智能体协作场域覆盖问题可被表述为马尔可夫势博弈,并且我们可以通过求解一个等效的单目标最优控制问题来学习参数化的闭环纳什均衡。因此,与博弈论基线方法相比,我们的算法在训练阶段快10倍,并且在策略执行阶段收敛更快。