Evolutionary science provides evidence that diversity confers resilience. Yet, traditional multi-agent reinforcement learning techniques commonly enforce homogeneity to increase training sample efficiency. When a system of learning agents is not constrained to homogeneous policies, individual agents may develop diverse behaviors, resulting in emergent complementarity that benefits the system. Despite this feat, there is a surprising lack of tools that measure behavioral diversity in systems of learning agents. Such techniques would pave the way towards understanding the impact of diversity in collective resilience and performance. In this paper, we introduce System Neural Diversity (SND): a measure of behavioral heterogeneity for multi-agent systems where agents have stochastic policies. %over a continuous state space. We discuss and prove its theoretical properties, and compare it with alternate, state-of-the-art behavioral diversity metrics used in cross-disciplinary domains. Through simulations of a variety of multi-agent tasks, we show how our metric constitutes an important diagnostic tool to analyze latent properties of behavioral heterogeneity. By comparing SND with task reward in static tasks, where the problem does not change during training, we show that it is key to understanding the effectiveness of heterogeneous vs homogeneous agents. In dynamic tasks, where the problem is affected by repeated disturbances during training, we show that heterogeneous agents are first able to learn specialized roles that allow them to cope with the disturbance, and then retain these roles when the disturbance is removed. SND allows a direct measurement of this latent resilience, while other proxies such as task performance (reward) fail to.
翻译:进化科学表明,多样性赋予系统韧性。然而,传统多智能体强化学习方法通常强制同质性以提高训练样本效率。当学习智能体系统不受同质策略约束时,个体智能体会发展出多样化的行为,产生有利于系统的涌现互补性。尽管这一发现意义重大,但目前严重缺乏能够度量学习智能体系统中行为多样性的工具。此类技术将为理解多样性对集体韧性与性能的影响铺平道路。本文提出系统神经多样性(SND):一种针对具有随机策略的多智能体系统的行为异质性度量方法。我们讨论并证明了其理论性质,并与跨学科领域当前最先进的行为多样性度量方法进行了比较。通过多种多智能体任务仿真,我们展示了该度量如何成为分析行为异质性潜在特性的重要诊断工具。通过将SND与静态任务(训练过程中问题不变)的任务奖励进行比较,我们证明它是理解异质性与同质性智能体有效性的关键。在动态任务(训练过程中问题受到重复扰动影响)中,我们发现异质性智能体首先能够学习应对扰动的专业化角色,并在扰动消除后保持这些角色。SND能够直接度量这种潜在韧性,而任务表现(奖励)等其他代理指标则无法实现。