Utilizing messages from teammates can improve coordination in cooperative Multi-agent Reinforcement Learning (MARL). Previous works typically combine raw messages of teammates with local information as inputs for policy. However, neglecting message aggregation poses significant inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in cooperative MARL. In this paper, we propose Multi-Agent communication via Self-supervised Information Aggregation (MASIA), where agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation invariant message encoder to generate common information-aggregated representation from messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Hence, each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Furthermore, considering the potential of offline learning for real-world applications, we build offline benchmarks for multi-agent communication, which is the first as we know. Empirical results demonstrate the superiority of our method in both online and offline settings. We also release the built offline benchmarks in this paper as a testbed for communication ability validation to facilitate further future research.
翻译:利用队友消息可提升合作型多智能体强化学习(MARL)中的协调能力。现有工作通常将队友原始消息与局部信息拼接作为策略输入,但忽略消息聚合将导致策略学习效率显著降低。受表征学习领域最新进展启发,本文论证了高效消息聚合对于实现合作型MARL中良好协调性的关键作用。为此,我们提出基于自监督信息聚合的多智能体通信方法(MASIA),使智能体能够将接收消息聚合为高相关性的紧凑表征以增强局部策略。具体而言,我们设计了一种置换不变的消息编码器,可从消息中生成通用信息聚合表征,并通过自监督方式重构与预测未来信息进行优化。进一步,通过新型消息提取机制,每个智能体可提取聚合表征中最相关部分用于决策。考虑到离线学习在真实场景中的应用潜力,我们构建了首个多智能体通信离线基准测试集。实验结果表明,本方法在在线与离线场景中均具有优越性。本文同时公开所构建的离线基准测试集,作为通信能力验证平台以促进未来研究。