This work focuses on the credit assignment problem in cooperative multi-agent reinforcement learning (MARL). Sharing the global advantage among agents often leads to insufficient policy optimization, as it fails to capture the coalitional contributions of different agents. In this work, we revisit the policy update process from a coalitional perspective and propose CORA, an advantage allocation method guided by a cooperative game-theoretic core allocation. By evaluating the marginal contributions of different coalitions and combining clipped double Q-learning to mitigate overestimation bias, CORA estimates coalition-wise advantages. The core formulation enforces coalition-wise lower bounds on allocated credits, so that coalitions with higher advantages receive stronger total incentives for their participating agents, enabling the global advantage to be attributed to different coalition strategies and promoting coordinated optimal behavior. To reduce computational overhead, we employ random coalition sampling to approximate the core allocation efficiently. Experiments on matrix games, differential games, and multi-agent collaboration benchmarks demonstrate that our method outperforms baselines. These findings highlight the importance of coalition-level credit assignment and cooperative games for advancing multi-agent learning.
翻译:本研究聚焦于合作多智能体强化学习(MARL)中的信用分配问题。在智能体间共享全局优势值通常会导致策略优化不足,因其未能捕捉不同智能体的联盟贡献。本文从联盟视角重新审视策略更新过程,提出CORA——一种由合作博弈论核心分配引导的优势值分配方法。通过评估不同联盟的边际贡献,并结合截断双重Q学习以缓解高估偏差,CORA实现了联盟层面优势值的估计。核心分配公式为分配的信用设定了联盟层面的下界,使得具有更高优势值的联盟为其参与智能体获得更强的总体激励,从而将全局优势值归因于不同的联盟策略,并促进协调最优行为。为降低计算开销,我们采用随机联盟采样来高效逼近核心分配。在矩阵博弈、微分博弈和多智能体协作基准测试上的实验表明,本方法优于基线模型。这些发现凸显了联盟级信用分配与合作博弈对推进多智能体学习的重要性。