Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in [4] and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements.
翻译:近年来,深度强化学习技术已被应用于合作多智能体系统,并取得了显著的实证成功。然而,由于缺乏理论洞察,目前尚不清楚所采用的神经网络在学习什么,以及如何增强其学习能力以解决它们无法应对的问题。在本工作中,我们通过一系列一次性博弈,实证研究了不同网络架构的学习能力。尽管这些博弈形式简单,但它们捕捉了多智能体环境中出现的许多关键问题,例如联合动作数量呈指数级增长或缺乏显式协调机制。我们的结果扩展了文献[4]中的发现,定量衡量了各种方法如何有效表示所需的价值函数,并帮助我们识别可能阻碍良好性能的原因,如价值稀疏性或过于严格的协调需求。