Communication can impressively improve cooperation in multi-agent reinforcement learning (MARL), especially for partially-observed tasks. However, existing works either broadcast the messages leading to information redundancy, or learn targeted communication by modeling all the other agents as targets, which is not scalable when the number of agents varies. In this work, to tackle the scalability problem of MARL communication for partially-observed tasks, we propose a novel framework Transformer-based Email Mechanism (TEM). The agents adopt local communication to send messages only to the ones that can be observed without modeling all the agents. Inspired by human cooperation with email forwarding, we design message chains to forward information to cooperate with the agents outside the observation range. We introduce Transformer to encode and decode the message chain to choose the next receiver selectively. Empirically, TEM outperforms the baselines on multiple cooperative MARL benchmarks. When the number of agents varies, TEM maintains superior performance without further training.
翻译:在多智能体强化学习(MARL)中,通信能够显著提升智能体间的协作能力,尤其在部分可观察任务中表现突出。然而,现有工作要么采用广播消息方式导致信息冗余,要么通过将所有其他智能体建模为目标来学习定向通信,这在智能体数量变化时缺乏可扩展性。为解决部分可观察任务中MARL通信的可扩展性问题,本文提出了一种新型框架——基于Transformer的邮件机制(TEM)。智能体采用局部通信策略,仅向可观测范围内的智能体发送消息,无需对所有智能体建模。受电子邮件转发的人类协作机制启发,我们设计了信息链结构,使信息能够传递至观测范围之外的智能体。通过引入Transformer对信息链进行编码与解码,系统能够选择性确定下一个信息接收者。实验表明,TEM在多个合作型MARL基准测试中优于基线方法。当智能体数量变化时,TEM无需额外训练即可保持优越性能。