This paper studies the problem of decentralized learning of Coarse Correlated Equilibrium (CCE) in aggregative Markov games (AMGs), where each agent's instantaneous reward depends only on its own action and an aggregate quantity. Existing CCE learning algorithms for general Markov games are not designed to leverage the aggregative structure, and research on decentralized CCE learning for AMGs remains limited. We propose an adaptive stage-based V-learning algorithm that exploits the aggregative structure under a fully decentralized information setting. Based on the two-timescale idea, the algorithm partitions learning into stages and adjusts stage lengths based on the variability of aggregate signals, while using no-regret updates within each stage. We prove the algorithm achieves an epsilon-approximate CCE in O(S Amax T5 / epsilon2) episodes, avoiding the curse of multiagents which commonly arises in MARL. Numerical results verify the theoretical findings, and the decentralized, model-free design enables easy extension to large-scale multi-agent scenarios.
翻译:本文研究聚合马尔可夫博弈中粗联合均衡的分散式学习问题,其中每个智能体的即时奖励仅取决于自身行动和聚合量。现有针对一般马尔可夫博弈的粗联合均衡学习算法未设计用于利用聚合结构,而针对聚合马尔可夫博弈的分散式粗联合均衡学习研究仍然有限。我们提出了一种基于自适应阶段的价值学习算法,该算法在全分散信息设置下利用聚合结构。基于双时间尺度思想,该算法将学习过程划分为多个阶段,并根据聚合信号的可变性调整阶段长度,同时在每个阶段内采用无遗憾更新。我们证明该算法在O(S Amax T5 / epsilon2)个回合内实现epsilon-近似粗联合均衡,避免了多智能体强化学习中常见的多智能体维度灾难。数值结果验证了理论发现,且其分散式、无模型设计便于扩展到大规模多智能体场景。