Learning to collaborate is critical in Multi-Agent Reinforcement Learning (MARL). Previous works promote collaboration by maximizing the correlation of agents' behaviors, which is typically characterized by Mutual Information (MI) in different forms. However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. PMIC uses a new collaboration criterion measured by the MI between global states and joint actions. Based on this criterion, the key idea of PMIC is maximizing the MI associated with superior collaborative behaviors and minimizing the MI associated with inferior ones. The two MI objectives play complementary roles by facilitating better collaborations while avoiding falling into sub-optimal ones. Experiments on a wide range of MARL benchmarks show the superior performance of PMIC compared with other algorithms.
翻译:学会协作在多智能体强化学习(MARL)中至关重要。先前的工作通过最大化智能体行为的相关性来促进协作,这种行为通常以不同形式的互信息(MI)为特征。然而,我们发现次优协作行为也会伴随强相关性出现,且令人惊讶的是,简单最大化MI可能会阻碍向更好协作的学习。为解决此问题,我们提出了一种新颖的MARL框架,称为渐进互信息协作(PMIC),以实现更有效的MI驱动协作。PMIC使用一种由全局状态与联合动作之间的互信息衡量的新协作标准。基于此标准,PMIC的核心思想是最大化与优质协作行为相关的互信息,同时最小化与劣质协作行为相关的互信息。这两个MI目标发挥互补作用,促进更好协作的同时避免陷入次优协作。在广泛MARL基准上的实验表明,PMIC相较于其他算法具有更优越的性能。