This paper proposes a generative probabilistic model integrating emergent communication and multi-agent reinforcement learning. The agents plan their actions by probabilistic inference, called control as inference, and communicate using messages that are latent variables and estimated based on the planned actions. Through these messages, each agent can send information about its actions and know information about the actions of another agent. Therefore, the agents change their actions according to the estimated messages to achieve cooperative tasks. This inference of messages can be considered as communication, and this procedure can be formulated by the Metropolis-Hasting naming game. Through experiments in the grid world environment, we show that the proposed PGM can infer meaningful messages to achieve the cooperative task.
翻译:本文提出一种融合涌现通信与多智能体强化学习的生成式概率模型。智能体通过概率推理(即控制作为推理)规划自身动作,并利用作为隐变量的消息进行通信,该消息基于已规划的动作进行估计。通过这些消息,每个智能体既能传递自身动作信息,也能获知其他智能体的动作信息。因此,智能体根据估计的消息调整自身动作以完成协作任务。消息的推理过程可视为通信行为,该过程可通过梅特罗波利斯-黑斯廷斯命名博弈进行形式化描述。在网格世界环境中的实验表明,所提出的概率图模型能够推断出有意义的协作任务消息。