The partial observability and stochasticity in multi-agent settings can be mitigated by accessing more information about others via communication. However, the coordination problem still exists since agents cannot communicate actual actions with each other at the same time due to the circular dependencies. In this paper, we propose a novel multi-level communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In the negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, obtained by modeling the environment dynamics. In the launching phase, the upper-level agents take the lead in making decisions and then communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in various cooperative multi-agent tasks.
翻译:多智能体环境中的部分可观测性与随机性可通过通信获取更多他者信息得以缓解。然而,由于循环依赖导致智能体无法同时相互通信实际动作,协调问题依然存在。本文提出一种新颖的多层级通信方案——顺序通信(SeqComm)。SeqComm采用异步方式处理智能体(高层级智能体先于低层级智能体决策),并包含两个通信阶段。在协商阶段,智能体通过通信观测隐藏状态,并比较通过环境动态建模获得的意图价值,以确定决策优先级。在启动阶段,高层级智能体率先决策,随后将其动作通信至低层级智能体。理论上,我们证明了SeqComm学习到的策略具有单调改进与收敛的保证。实证研究表明,SeqComm在多种协作型多智能体任务中优于现有方法。