Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.
翻译:多样性在提升多智能体强化学习(MARL)性能中起着关键作用。目前,已发展出诸多基于多样性的方法以克服传统MARL中过度参数共享的弊端。然而,目前仍缺乏一个通用的度量标准来量化智能体间的策略差异。这样的度量标准不仅有助于评估多智能体系统中多样性的演化过程,还能为基于多样性的MARL算法设计提供指导。本文提出多智能体策略距离(MAPD),这是一种用于测量MARL中策略差异的通用工具。通过学习智能体决策的条件表征,MAPD能够计算任意一对智能体间的策略距离。此外,我们将MAPD扩展为可定制的版本,能够量化指定方面上智能体策略间的差异。基于MAPD的在线部署,我们设计了多智能体动态参数共享(MADPS)算法作为MAPD应用的一个实例。大量实验表明,我们的方法在测量智能体策略差异及特定行为倾向方面具有有效性。同时,与其他参数共享方法相比,MADPS展现出更优的性能。