Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malham\'e, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely on solving partial or stochastic differential equations with a full knowledge of the model. Recently, Reinforcement Learning (RL) has appeared promising to solve complex problems at scale. The combination of RL and MFGs is promising to solve games at a very large scale both in terms of population size and environment complexity. In this survey, we review the quickly growing recent literature on RL methods to learn equilibria and social optima in MFGs. We first identify the most common settings (static, stationary, and evolutive) of MFGs. We then present a general framework for classical iterative methods (based on best-response computation or policy evaluation) to solve MFGs in an exact way. Building on these algorithms and the connection with Markov Decision Processes, we explain how RL can be used to learn MFG solutions in a model-free way. Last, we present numerical illustrations on a benchmark problem, and conclude with some perspectives.
翻译:非合作与合作博弈在拥有大量参与者时虽具有广泛应用,但当参与者数量增加时通常难以求解。Lasry、Lions以及Huang、Caines、Malhamé提出的平均场博弈(Mean Field Games, MFGs)通过平均场近似允许参与者数量趋于无穷。传统求解这类博弈的方法通常依赖于在完全已知模型的情况下求解偏微分方程或随机微分方程。近年来,强化学习(Reinforcement Learning, RL)在大规模复杂问题求解方面展现出潜力。RL与MFGs的结合有望在人口规模与环境复杂性两个维度上解决大规模博弈问题。本综述回顾了近期快速发展的关于使用RL方法学习MFGs中均衡与社会最优的文献。我们首先识别了MFGs最常见的设定(静态、稳态与演化型),随后介绍用于精确求解MFGs的经典迭代方法(基于最优反应计算或策略评估)通用框架。基于这些算法及其与马尔可夫决策过程的联系,我们阐释如何通过无模型方式使用RL学习MFGs解。最后,我们以基准问题数值示例进行说明,并总结展望。