Reaching consensus is key to multi-agent coordination. To accomplish a cooperative task, agents need to coherently select optimal joint actions to maximize the team reward. However, current cooperative multi-agent reinforcement learning (MARL) methods usually do not explicitly take consensus into consideration, which may cause miscoordination problem. In this paper, we propose a model-based consensus mechanism to explicitly coordinate multiple agents. The proposed Multi-agent Goal Imagination (MAGI) framework guides agents to reach consensus with an Imagined common goal. The common goal is an achievable state with high value, which is obtained by sampling from the distribution of future states. We directly model this distribution with a self-supervised generative model, thus alleviating the "curse of dimensinality" problem induced by multi-agent multi-step policy rollout commonly used in model-based methods. We show that such efficient consensus mechanism can guide all agents cooperatively reaching valuable future states. Results on Multi-agent Particle-Environments and Google Research Football environment demonstrate the superiority of MAGI in both sample efficiency and performance.
翻译:达成共识是多智能体协调的关键。为了完成协作任务,智能体需要协调一致地选择最优联合行动以最大化团队奖励。然而,当前协作多智能体强化学习方法通常未明确考虑共识问题,这可能导致协调失误。本文提出了一种基于模型的共识机制来显式协调多个智能体。所提出的多智能体目标想象(MAGI)框架引导智能体通过想象中的共同目标达成共识。该共同目标是一个具有高价值的可达状态,通过从未来状态分布中采样获得。我们直接使用自监督生成模型对这一分布进行建模,从而缓解了基于模型方法中常用的多智能体多步策略 rollout 所引发的“维度灾难”问题。研究表明,这种高效的共识机制能够引导所有智能体协作地达到有价值的未来状态。在多智能体粒子环境和谷歌研究足球环境上的实验结果证明了 MAGI 在样本效率和性能方面的优越性。