Previous deep multi-agent reinforcement learning (MARL) algorithms have achieved impressive results, typically in homogeneous scenarios. However, heterogeneous scenarios are also very common and usually harder to solve. In this paper, we mainly discuss cooperative heterogeneous MARL problems in Starcraft Multi-Agent Challenges (SMAC) environment. We firstly define and describe the heterogeneous problems in SMAC. In order to comprehensively reveal and study the problem, we make new maps added to the original SMAC maps. We find that baseline algorithms fail to perform well in those heterogeneous maps. To address this issue, we propose the Grouped Individual-Global-Max Consistency (GIGM) and a novel MARL algorithm, Grouped Hybrid Q Learning (GHQ). GHQ separates agents into several groups and keeps individual parameters for each group, along with a novel hybrid structure for factorization. To enhance coordination between groups, we maximize the Inter-group Mutual Information (IGMI) between groups' trajectories. Experiments on original and new heterogeneous maps show the fabulous performance of GHQ compared to other state-of-the-art algorithms.
翻译:摘要:以往的深度多智能体强化学习算法通常在同构场景中取得了令人瞩目的成果。然而,异构场景也十分普遍且通常更难以解决。本文主要探讨星际争霸多智能体挑战(SMAC)环境中的合作异构多智能体强化学习问题。我们首先定义并描述了SMAC中的异构问题。为全面揭示和研究该问题,我们在原SMAC地图基础上新增了地图。研究发现基线算法在这些异构地图上表现不佳。为解决这一问题,我们提出了分组个体-全局最大一致性(GIGM)以及一种新颖的多智能体强化学习算法——分组混合Q学习(GHQ)。GHQ将智能体分为若干组,为每组保留独立参数,并采用新颖的混合结构进行因子分解。为增强组间协调,我们最大化各组轨迹间的组间互信息(IGMI)。在原始地图和新异构地图上的实验表明,GHQ相较于其他最先进算法展现出优异性能。