Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication method that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The method can be seamlessly integrated with any action-value function decomposition algorithm and can be viewed as an orthogonal extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents, which makes it scalable to large systems. Experimental results on the SMACv2 benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps. makes it scalable to large systems. Experimental results on the SMACv2 benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.
翻译:通信对于人类智能体协同执行复杂任务至关重要,这激发了多智能体强化学习(MARL)中通信机制的研究兴趣。然而,现有的MARL通信协议通常复杂且不可微分。本文提出了一种基于自注意力的通信方法,用于在MARL智能体之间交换信息。所提出的方法完全可微分,允许智能体以奖励驱动的方式学习生成消息。该方法可与任何动作价值函数分解算法无缝集成,并可视为此类分解的正交扩展。值得注意的是,该方法包含固定数量的可训练参数,与智能体数量无关,从而可扩展至大规模系统。在SMACv2基准测试上的实验结果表明了该方法的有效性,其在多个地图上实现了最先进的性能。