Agent based modelling (ABM) is a computational approach to modelling complex systems by specifying the behaviour of autonomous decision-making components or agents in the system and allowing the system dynamics to emerge from their interactions. Recent advances in the field of Multi-agent reinforcement learning (MARL) have made it feasible to study the equilibrium of complex environments where multiple agents learn simultaneously. However, most ABM frameworks are not RL-native, in that they do not offer concepts and interfaces that are compatible with the use of MARL to learn agent behaviours. In this paper, we introduce a new open-source framework, Phantom, to bridge the gap between ABM and MARL. Phantom is an RL-driven framework for agent-based modelling of complex multi-agent systems including, but not limited to economic systems and markets. The framework aims to provide the tools to simplify the ABM specification in a MARL-compatible way - including features to encode dynamic partial observability, agent utility functions, heterogeneity in agent preferences or types, and constraints on the order in which agents can act (e.g. Stackelberg games, or more complex turn-taking environments). In this paper, we present these features, their design rationale and present two new environments leveraging the framework.
翻译:基于智能体的建模(ABM)是一种通过指定系统中自主决策主体(或智能体)的行为,并让系统动态从其交互中涌现的复杂系统计算方法。多智能体强化学习(MARL)领域的最新进展使得研究多个智能体同时学习的复杂环境中的均衡成为可能。然而,大多数ABM框架并非原生支持强化学习(RL-native),即未提供与使用MARL学习智能体行为兼容的概念和接口。本文提出了一种新的开源框架 Phantom,以弥合ABM与MARL之间的差距。Phantom 是一个基于强化学习驱动的框架,用于对复杂多智能体系统(包括但不限于经济系统和市场)进行基于智能体的建模。该框架旨在提供简化ABM规范的工具,使其以兼容MARL的方式实现——包括编码动态部分可观测性、智能体效用函数、智能体偏好或类型的异质性,以及智能体行动顺序约束(如斯塔克伯格博弈或更复杂的轮替环境)等功能。本文展示了这些特性及其设计原理,并介绍了基于该框架的两个新环境。