Efficiently learning equilibria with large state and action spaces in general-sum Markov games while overcoming the curse of multi-agency is a challenging problem. Recent works have attempted to solve this problem by employing independent linear function classes to approximate the marginal $Q$-value for each agent. However, existing sample complexity bounds under such a framework have a suboptimal dependency on the desired accuracy $\varepsilon$ or the action space. In this work, we introduce a new algorithm, Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local access to the simulator, i.e., one can interact with the underlying environment on the visited states. Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $\epsilon$-CCE with a provable optimal accuracy bound $O(\epsilon^{-2})$ and gets rids of the linear dependency on the action space, while scaling polynomially with relevant problem parameters (such as the number of agents and time horizon). Moreover, our analysis of Linear-Confident-FTRL generalizes the virtual policy iteration technique in the single-agent local planning literature, which yields a new computationally efficient algorithm with a tighter sample complexity bound when assuming random access to the simulator.
翻译:在大状态空间和动作空间的一般和马尔可夫博弈中高效学习均衡,同时克服多智能体诅咒是一个具有挑战性的问题。近期研究尝试通过采用独立线性函数类来近似每个智能体的边际$Q$值以解决该问题。然而,现有框架下的样本复杂度界对期望精度$\varepsilon$或动作空间存在次优依赖。本文提出一种新算法Lin-Confident-FTRL,用于在局部访问模拟器的条件下学习粗相关均衡(CCE),即仅可在已访问的状态上与底层环境交互。Lin-Confident-FTRL在学习$\epsilon$-CCE时,除状态空间大小的对数依赖外,实现了可证明的最优精度界$O(\epsilon^{-2})$,消除了对动作空间的线性依赖,同时随相关问题参数(如智能体数量和时间范围)多项式缩放。此外,我们对Lin-Confident-FTRL的分析推广了单智能体局部规划文献中的虚拟策略迭代技术,在假设可随机访问模拟器的情况下,得到了一种具有更紧样本复杂度界的新计算高效算法。