We introduce a class of networked Markov potential games in which agents are associated with nodes in a network. Each agent has its own local potential function, and the reward of each agent depends only on the states and actions of the agents within a neighborhood. In this context, we propose a localized actor-critic algorithm. The algorithm is scalable since each agent uses only local information and does not need access to the global state. Further, the algorithm overcomes the curse of dimensionality through the use of function approximation. Our main results provide finite-sample guarantees up to a localization error and a function approximation error. Specifically, we achieve an $\tilde{\mathcal{O}}(\tilde{\epsilon}^{-4})$ sample complexity measured by the averaged Nash regret. This is the first finite-sample bound for multi-agent competitive games that does not depend on the number of agents.
翻译:我们引入一类网络马尔可夫势博弈,其中智能体与网络中的节点相关联。每个智能体拥有其自身的局部势函数,且每个智能体的奖励仅依赖于邻域内智能体的状态与动作。在此背景下,我们提出一种局部化Actor-Critic算法。该算法具有可扩展性,因为每个智能体仅利用局部信息,无需访问全局状态。此外,该算法通过函数逼近克服了维度灾难。我们的主要结果给出了有限样本保障,其误差上界由局部化误差与函数逼近误差构成。具体而言,我们实现了以平均纳什遗憾度量的$\tilde{\mathcal{O}}(\tilde{\epsilon}^{-4})$样本复杂度。这是首个不依赖于智能体数量的多智能体竞争博弈的有限样本界。