The study of behavioral diversity in Multi-Agent Reinforcement Learning (MARL) is a nascent yet promising field. In this context, the present work deals with the question of how to control the diversity of a multi-agent system. With no existing approaches to control diversity to a set value, current solutions focus on blindly promoting it via intrinsic rewards or additional loss functions, effectively changing the learning objective and lacking a principled measure for it. To address this, we introduce Diversity Control (DiCo), a method able to control diversity to an exact value of a given metric by representing policies as the sum of a parameter-shared component and dynamically scaled per-agent components. By applying constraints directly to the policy architecture, DiCo leaves the learning objective unchanged, enabling its applicability to any actor-critic MARL algorithm. We theoretically prove that DiCo achieves the desired diversity, and we provide several experiments, both in cooperative and competitive tasks, that show how DiCo can be employed as a novel paradigm to increase performance and sample efficiency in MARL. Multimedia results are available on the paper's website: https://sites.google.com/view/dico-marl.
翻译:多智能体强化学习(MARL)中的行为多样性研究是一个新兴但前景广阔的领域。在此背景下,本研究探讨了如何控制多智能体系统的多样性。由于目前缺乏将多样性控制到设定值的方法,现有解决方案主要通过内在奖励或附加损失函数盲目地促进多样性,这实质上改变了学习目标,且缺乏对其的原则性度量。为解决这一问题,我们提出了多样性控制(DiCo)方法,该方法通过将策略表示为参数共享组件与动态缩放的智能体特定组件之和,能够将多样性精确控制到给定度量的目标值。通过对策略架构直接施加约束,DiCo 保持了学习目标不变,使其可适用于任何演员-评论家 MARL 算法。我们从理论上证明了 DiCo 能够实现期望的多样性,并在合作与竞争任务中进行了多组实验,结果表明 DiCo 可作为提升 MARL 性能与样本效率的新范式。多媒体结果详见论文网站:https://sites.google.com/view/dico-marl。