Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Traditional multi-agent reinforcement learning algorithms are difficultly applied in a large-scale multi-agent environment. The introduction of mean field theory has enhanced the scalability of multi-agent reinforcement learning in recent years. This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range. This partial observability affects the agent's ability to assess the quality of the actions of surrounding agents. This paper focuses on developing a method to capture more effective information from local observations in order to select more effective actions. Previous work in this field employs probability distributions or weighted mean field to update the average actions of neighborhood agents, but it does not fully consider the feature information of surrounding neighbors and leads to a local optimum. In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. We evaluate GAMFQ on three challenging tasks in the MAgents framework. Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.

翻译：传统多智能体强化学习算法难以应用于大规模多智能体环境。近年来，均值场理论的引入增强了多智能体强化学习的可扩展性。本文考虑部分可观测的多智能体强化学习（MARL）场景，其中每个智能体只能观测到固定范围内的其他智能体。这种部分可观测性影响了智能体评估周围智能体动作质量的能力。本文致力于开发一种方法，从局部观测中捕获更有效的信息，以选择更有效的动作。该领域的先前工作采用概率分布或加权均值场来更新邻域智能体的平均动作，但未充分考虑周围邻居的特征信息，导致陷入局部最优。本文提出一种新颖的多智能体强化学习算法——基于图注意力的部分可观均值场多智能体强化学习（GAMFQ），以弥补这一缺陷。GAMFQ利用图注意力模块和均值场模块来描述每个时间步中智能体受其他智能体动作影响的方式。该图注意力模块由图注意力编码器和可微分注意力机制组成，该机制输出动态图以表示邻域智能体对中心智能体的影响程度。均值场模块将邻域智能体对中心智能体的影响近似为有效邻域智能体的平均影响。我们在MAgents框架上对三个具有挑战性的任务进行了GAMFQ评估。实验表明，GAMFQ的性能优于包括最先进的部分可观均值场强化学习算法在内的基线方法。