Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.
翻译:多智能体强化学习中的协同合作需要智能体之间实现无缝协作,通常通过底层关系图来建模。现有学习此类关系图的方法主要关注智能体对之间的关系,忽略了更高阶的关联性。虽然部分方法尝试将合作建模扩展至群体内的行为相似性,但它们往往无法同时学习潜在的关系图,从而限制了部分观测智能体之间的信息交换。为解决这些局限,我们提出了一种推断群体感知协调图(GACG)的新方法,该方法旨在同时捕获基于当前观测的智能体对间合作,以及通过跨轨迹行为模式观测到的群体层级依赖关系。该图进一步用于决策过程中智能体间的图卷积信息交换。为增强同一群体内智能体的行为一致性,我们引入了群体距离损失函数,该函数在促进群体凝聚力的同时鼓励群体间的特异性分工。在星际争霸II微观管理任务上的评估表明,GACG取得了优越性能。消融研究进一步为方法各组成部分的有效性提供了实验证据。