In multi-agent reinforcement learning, the use of a global objective is a powerful tool for incentivising cooperation. Unfortunately, it is not sample-efficient to train individual agents with a global reward, because it does not necessarily correlate with an agent's individual actions. This problem can be solved by factorising the global value function into local value functions. Early work in this domain performed factorisation by conditioning local value functions purely on local information. Recently, it has been shown that providing both local information and an encoding of the global state can promote cooperative behaviour. In this paper we propose QGNN, the first value factorisation method to use a graph neural network (GNN) based model. The multi-layer message passing architecture of QGNN provides more representational complexity than models in prior work, allowing it to produce a more effective factorisation. QGNN also introduces a permutation invariant mixer which is able to match the performance of other methods, even with significantly fewer parameters. We evaluate our method against several baselines, including QMIX-Att, GraphMIX, QMIX, VDN, and hybrid architectures. Our experiments include Starcraft, the standard benchmark for credit assignment; Estimate Game, a custom environment that explicitly models inter-agent dependencies; and Coalition Structure Generation, a foundational problem with real-world applications. The results show that QGNN outperforms state-of-the-art value factorisation baselines consistently.
翻译:在多智能体强化学习中,使用全局目标是激励协作的有力工具。然而,由于全局奖励未必与单个智能体的个体行为相关,因此利用全局奖励训练个体智能体并非样本高效。这一问题可通过将全局价值函数分解为局部价值函数来解决。早期工作通过仅依赖局部信息进行价值函数分解。近期研究表明,同时提供局部信息与全局状态编码能够促进协作行为。本文提出QGNN——首个基于图神经网络(GNN)模型的价值分解方法。其多层消息传递架构相较于先前模型具备更强的表示复杂度,从而能够实现更有效的分解。此外,QGNN引入了一种置换不变混合器,即使参数量显著减少,也能达到与其他方法相当的性能。我们将该方法与包括QMIX-Att、GraphMIX、QMIX、VDN及混合架构在内的多个基线进行对比。实验涵盖星际争霸(信用分配标准基准)、估计博弈(显式建模智能体间依赖关系的自定义环境)以及联盟结构生成(具有实际应用价值的基础问题)。结果表明,QGNN在价值分解基线方法中始终取得最优性能。