Simulating the economic impact of rationality through reinforcement learning and agent-based modelling

Agent-based models (ABMs) are simulation models used in economics to overcome some of the limitations of traditional frameworks based on general equilibrium assumptions. However, agents within an ABM follow predetermined, not fully rational, behavioural rules which can be cumbersome to design and difficult to justify. Here we leverage multi-agent reinforcement learning (RL) to expand the capabilities of ABMs with the introduction of fully rational agents that learn their policy by interacting with the environment and maximising a reward function. Specifically, we propose a 'Rational macro ABM' (R-MABM) framework by extending a paradigmatic macro ABM from the economic literature. We show that gradually substituting ABM firms in the model with RL agents, trained to maximise profits, allows for a thorough study of the impact of rationality on the economy. We find that RL agents spontaneously learn three distinct strategies for maximising profits, with the optimal strategy depending on the level of market competition and rationality. We also find that RL agents with independent policies, and without the ability to communicate with each other, spontaneously learn to segregate into different strategic groups, thus increasing market power and overall profits. Finally, we find that a higher degree of rationality in the economy always improves the macroeconomic environment as measured by total output, depending on the specific rational policy, this can come at the cost of higher instability. Our R-MABM framework is general, it allows for stable multi-agent learning, and represents a principled and robust direction to extend existing economic simulators.

翻译：基于智能体的模型（ABM）是经济学中用于克服传统一般均衡假设框架某些局限性的仿真模型。然而，ABM中的智能体遵循预设而非完全理性的行为规则，这些规则设计繁琐且难以论证。本文利用多智能体强化学习（RL）扩展ABM的能力，引入能够通过环境交互并最大化奖励函数来学习策略的完全理性智能体。具体而言，我们通过扩展经济学文献中一个范式性的宏观ABM，提出了一种"理性宏观ABM"（R-MABM）框架。研究表明，逐步将模型中的ABM企业替换为以利润最大化为训练目标的RL智能体，能够系统性地研究理性行为对经济的影响。我们发现RL智能体会自发学习三种不同的利润最大化策略，其最优策略取决于市场竞争程度与理性水平。同时，采用独立策略且无法相互通信的RL智能体会自发形成不同策略群体，从而增强市场势力并提升整体利润。最终，研究显示经济中更高程度的理性总能改善以总产出衡量的宏观经济环境，但特定理性策略可能以更高不稳定性为代价。我们的R-MABM框架具有通用性，支持稳定的多智能体学习，为扩展现有经济仿真系统提供了具有原则性和鲁棒性的方向。