Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. Solving RMGs remains under-explored, from problem formulation to the development of sample-efficient algorithms. A notorious yet open challenge is if RMGs can escape the curse of multiagency, where the sample complexity scales exponentially with the number of agents. In this work, we propose a natural class of RMGs where the uncertainty set of each agent is shaped by both the environment and other agents' strategies in a best-response manner. We first establish the well-posedness of these RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs.
翻译:标准多智能体强化学习(MARL)算法容易受到仿真与现实差距的影响。为解决此问题,分布鲁棒马尔可夫博弈(RMG)被提出,通过在预设不确定性集合内博弈动态变化时优化最坏情况性能,以增强MARL的鲁棒性。从问题表述到样本高效算法的开发,求解RMG仍处于探索不足的状态。一个著名且尚未解决的挑战是RMG能否摆脱多智能体诅咒,即样本复杂度随智能体数量呈指数级增长。在本工作中,我们提出一类自然的RMG,其中每个智能体的不确定性集合由环境及其他智能体以最佳响应方式共同塑造。我们首先通过证明鲁棒纳什均衡与粗相关均衡(CCE)等博弈论解的存在性,确立了这些RMG的适定性。在假设可访问生成模型的前提下,我们进一步提出一种样本高效算法用于学习CCE,其样本复杂度与所有相关参数呈多项式关系。据我们所知,这是首个能打破RMG多智能体诅咒的算法。