Each year, expert-level performance is attained in increasingly-complex multiagent domains, where notable examples include Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain.
翻译:每年,专家级性能在日益复杂的多智能体领域中得以实现,典型例子包括围棋、扑克和星际争霸II。这种快速进步伴随着对更好地理解这些智能体如何实现该性能的相应需求,以便实现其安全部署、识别局限性并揭示改进的潜在途径。本文从以性能为中心的多智能体学习退一步,转而关注智能体行为分析。我们提出了一种与模型无关的方法,用于在多智能体领域中发现行为聚类,利用变分推断在联合和局部智能体层级上学习行为的层次结构。我们的框架对智能体的底层学习算法不做任何假设,无需访问其潜在状态或策略,仅使用离线观测数据进行训练。我们展示了该方法在实现联合和局部智能体层级行为耦合理解、检测训练过程中的行为变化点、发现核心行为概念方面的有效性,证明了该方法在高维多智能体MuJoCo控制领域的可扩展性,并展示了该方法能够解耦OpenAI捉迷藏领域中先前训练的策略。