Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term "agent indication". Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for image based observation spaces.
翻译:参数共享是一种流行的多智能体深度强化学习基线方法,其中每个智能体独立学习策略,且所有策略间共享完全相同的参数。然而,由于所有智能体共享同一策略网络,它们无法学习不同的策略或任务。现有实验通过向观测中添加智能体专属指示信号(我们称之为"智能体指示")来规避此问题。但智能体指示存在局限性:若不经修改,它无法使参数共享应用于动作空间和/或观测空间异构的环境。本研究首次形式化定义了智能体指示的概念,并证明其能够收敛到最优策略。接着,我们正式提出将参数共享扩展至异构观测空间和动作空间学习的若干方法,并证明这些方法可保证收敛至最优策略。最后,我们通过实验验证了所提方法的有效性,并基于图像观测空间,开展了大量实验系统研究多种智能体指示方案的实证效能。