Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.
翻译:多样性在提升多智能体强化学习(MARL)性能中起着关键作用。目前,已开发出多种基于多样性的方法来解决传统MARL中过度参数共享的缺陷。然而,目前仍缺乏一个通用的度量标准来量化智能体间的策略差异。这种度量不仅能促进多智能体系统中多样性演化的评估,还能为基于多样性的MARL算法设计提供指导。本文提出多智能体策略距离(MAPD),这是一种用于度量MARL中策略差异的通用工具。通过学习智能体决策的条件表征,MAPD能够计算任意两个智能体之间的策略距离。此外,我们将MAPD扩展为可定制版本,可在指定方面量化智能体策略间的差异。基于MAPD的在线部署,我们设计了多智能体动态参数共享(MADPS)算法作为MAPD应用的示例。大量实验表明,我们的方法能有效度量智能体策略及特定行为倾向的差异。此外,与其他参数共享方法相比,MADPS展现出更优的性能。