Communication in multi-agent reinforcement learning has been drawing attention recently for its significant role in cooperation. However, multi-agent systems may suffer from limitations on communication resources and thus need efficient communication techniques in real-world scenarios. According to the Shannon-Hartley theorem, messages to be transmitted reliably in worse channels require lower entropy. Therefore, we aim to reduce message entropy in multi-agent communication. A fundamental challenge is that the gradients of entropy are either 0 or infinity, disabling gradient-based methods. To handle it, we propose a pseudo gradient descent scheme, which reduces entropy by adjusting the distributions of messages wisely. We conduct experiments on two base communication frameworks with six environment settings and find that our scheme can reduce message entropy by up to 90% with nearly no loss of cooperation performance.
翻译:多智能体强化学习中的通信因其在合作中的重要作用而受到广泛关注。然而,实际场景中的多智能体系统可能面临通信资源的限制,因此需要高效的通信技术。根据香农-哈特利定理,在较差的信道中可靠传输的消息需要更低的熵。为此,我们旨在降低多智能体通信中的消息熵。一个根本性挑战是熵的梯度要么为0要么为无穷大,这使得基于梯度的方法无法适用。为解决此问题,我们提出了一种伪梯度下降方案,通过智能调整消息分布来降低熵。我们在两个基础通信框架和六种环境设置下进行了实验,结果表明,我们的方案可以在几乎不损失合作性能的情况下,将消息熵降低高达90%。