We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation. In order to find the minimum assumption for sample-efficient learning, we introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. Using this measure, we propose the first unified algorithmic framework that ensures sample efficiency in learning Nash Equilibrium, Coarse Correlated Equilibrium, and Correlated Equilibrium for both model-based and model-free MARL problems with low MADC. We also show that our algorithm provides comparable sublinear regret to the existing works. Moreover, our algorithm combines an equilibrium-solving oracle with a single objective optimization subprocedure that solves for the regularized payoff of each deterministic joint policy, which avoids solving constrained optimization problems within data-dependent constraints (Jin et al. 2020; Wang et al. 2023) or executing sampling procedures with complex multi-objective optimization problems (Foster et al. 2023), thus being more amenable to empirical implementation.
翻译:我们研究在一般函数逼近下的一般和马尔可夫博弈(MGs)中的多智能体强化学习(MARL)。为了找到样本高效学习所需的最小假设,我们引入了一种新的复杂度度量——多智能体解耦系数(MADC),用于一般和MGs。利用该度量,我们提出了首个统一算法框架,该框架确保在低MADC条件下,针对基于模型和无模型的MARL问题,学习纳什均衡、粗相关均衡和相关均衡时具有样本效率。我们还表明,该算法提供了与现有工作相当的次线性遗憾。此外,该算法将均衡求解预言机与单一目标优化子程序相结合,后者求解每个确定性联合策略的正则化收益,从而避免了在数据相关约束下求解约束优化问题(Jin等人,2020;Wang等人,2023)或执行涉及复杂多目标优化问题的采样程序(Foster等人,2023),因此更易于实际实现。