Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. Solving RMGs remains under-explored, from problem formulation to the development of sample-efficient algorithms. A notorious yet open challenge is if RMGs can escape the curse of multiagency, where the sample complexity scales exponentially with the number of agents. In this work, we propose a natural class of RMGs where the uncertainty set of each agent is shaped by both the environment and other agents' strategies in a best-response manner. We first establish the well-posedness of these RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs.
翻译:标准的多智能体强化学习算法容易受到仿真到现实差距的影响。为解决这一问题,分布鲁棒马尔可夫博弈被提出,通过在预设不确定性集合内游戏动态发生变化时优化最坏情况性能,以增强多智能体强化学习的鲁棒性。从问题表述到样本高效算法的开发,求解分布鲁棒马尔可夫博弈的研究仍处于探索不足的状态。一个著名且尚未解决的挑战是:分布鲁棒马尔可夫博弈能否摆脱多智能体诅咒——即样本复杂度随智能体数量呈指数级增长。在本研究中,我们提出了一类自然的分布鲁棒马尔可夫博弈,其中每个智能体的不确定性集合由环境和其他智能体的策略以最佳响应方式共同塑造。我们首先通过证明鲁棒纳什均衡和粗相关均衡等博弈论解的存在性,确立了这类分布鲁棒马尔可夫博弈的适定性。在假设可访问生成模型的前提下,我们进一步提出了一种学习粗相关均衡的样本高效算法,其样本复杂度随所有相关参数呈多项式增长。据我们所知,这是首个能够打破分布鲁棒马尔可夫博弈中多智能体诅咒的算法。