Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of Attention-guided COntrastive Role representation learning for MARL (ACORM) to promote behavior heterogeneity, knowledge transfer, and skillful coordination across agents. First, we introduce mutual information maximization to formalize role representation learning, derive a contrastive learning objective, and concisely approximate the distribution of negative pairs. Second, we leverage an attention mechanism to prompt the global state to attend to learned role representations in value decomposition, implicitly guiding agent coordination in a skillful role space to yield more expressive credit assignment. Experiments and visualizations on challenging StarCraft II micromanagement tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. Our code is available at https://github.com/NJU-RL/ACORM}{https://github.com/NJU-RL/ACORM.
翻译:现实中的多智能体任务通常涉及动态团队组成与角色涌现,这应成为多智能体强化学习(MARL)中高效协作的关键。受角色与智能体行为模式之间关联的启发,我们提出了一种面向MARL的注意力引导对比角色表示学习框架(ACORM),旨在促进智能体间的行为异质性、知识迁移与协同协作。首先,我们引入互信息最大化来形式化角色表示学习,推导出对比学习目标,并简洁近似负样本对的分布。其次,我们利用注意力机制促使全局状态在值函数分解中关注学到的角色表示,从而在隐式层面引导智能体在精细化的角色空间中进行协调,生成更具表现力的信用分配。在具有挑战性的星际争霸II微观管理任务上的实验与可视化结果表明,我们的方法达到了最先进的性能,并优于现有方法。我们的代码可在 https://github.com/NJU-RL/ACORM 获取。