Exploration in decentralized cooperative multi-agent reinforcement learning faces two challenges. One is that the novelty of global states is unavailable, while the novelty of local observations is biased. The other is how agents can explore in a coordinated way. To address these challenges, we propose MACE, a simple yet effective multi-agent coordinated exploration method. By communicating only local novelty, agents can take into account other agents' local novelty to approximate the global novelty. Further, we newly introduce weighted mutual information to measure the influence of one agent's action on other agents' accumulated novelty. We convert it as an intrinsic reward in hindsight to encourage agents to exert more influence on other agents' exploration and boost coordinated exploration. Empirically, we show that MACE achieves superior performance in three multi-agent environments with sparse rewards.
翻译:在去中心化协作多智能体强化学习的探索过程中面临两大挑战:一是全局状态的新颖性无法获取,而局部观测的新颖性存在偏差;二是智能体如何实现协同探索。针对这些问题,我们提出MACE——一种简洁高效的多智能体协同探索方法。通过仅共享局部新颖性信息,智能体能借助其他智能体的局部新颖性近似全局新颖性。此外,我们创新性地引入加权互信息来衡量单个智能体动作对其他智能体累积新颖性的影响程度,并将其转化为事后内在奖励,以激励智能体对他人探索产生更大影响,从而促进协同探索。实验表明,在三个具有稀疏奖励的多智能体环境中,MACE均展现出卓越性能。