Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments.
翻译:在多智能体强化学习中,当智能体无法观测环境完整状态时,通信至关重要。允许智能体间学习通信的最常见方法是使用可微通信通道,使梯度能在智能体间以反馈形式流动。然而,当我们希望使用离散消息以减小消息大小时,这面临挑战,因为梯度无法通过离散通信通道。先前工作提出了应对该问题的方法,但这些方法在不同通信学习架构和环境下测试,难以进行对比。本文比较了多种最先进的离散化方法以及一种新方法。我们在利用其他智能体梯度进行通信学习的背景下开展比较,并在多个环境中进行测试。此外,我们提出了COMA-DIAL——一种基于DIAL和COMA扩展学习率缩放与自适应探索的通信学习方法。COMA-DIAL使我们能在更复杂环境中开展实验。结果表明,本文提出的新型ST-DRU方法在所有离散化方法中跨不同环境取得了最佳结果。它在每个实验中达到最佳或接近最佳性能,且是唯一在全部测试环境中均未失败的方法。