In cooperative multi-agent reinforcement learning (MARL), agents collaborate to achieve common goals, such as defeating enemies and scoring a goal. However, learning goal-reaching paths toward such a semantic goal takes a considerable amount of time in complex tasks and the trained model often fails to find such paths. To address this, we present LAtent Goal-guided Multi-Agent reinforcement learning (LAGMA), which generates a goal-reaching trajectory in latent space and provides a latent goal-guided incentive to transitions toward this reference trajectory. LAGMA consists of three major components: (a) quantized latent space constructed via a modified VQ-VAE for efficient sample utilization, (b) goal-reaching trajectory generation via extended VQ codebook, and (c) latent goal-guided intrinsic reward generation to encourage transitions towards the sampled goal-reaching path. The proposed method is evaluated by StarCraft II with both dense and sparse reward settings and Google Research Football. Empirical results show further performance improvement over state-of-the-art baselines.
翻译:在合作型多智能体强化学习(MARL)中,智能体通过协作实现共同目标,例如击败敌人或完成得分。然而,在复杂任务中学习达成此类语义目标的路径通常耗时巨大,且训练后的模型往往难以找到有效路径。为解决这一问题,本文提出基于潜在目标引导的多智能体强化学习方法(LAGMA),该方法在潜在空间中生成目标达成轨迹,并为朝向该参考轨迹的状态转移提供潜在目标引导激励。LAGMA包含三个核心组件:(a)通过改进的VQ-VAE构建量化潜在空间以实现高效样本利用;(b)通过扩展的VQ码本生成目标达成轨迹;(c)生成潜在目标引导的内在奖励以激励智能体沿采样路径向目标转移。本方法在《星际争霸II》(密集奖励与稀疏奖励设置)及Google Research Football环境中进行评估。实验结果表明,相较于当前最先进的基线方法,本方法能实现进一步的性能提升。