The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing family of algorithms designed to solve large-scale multi-agent systems. However, the field currently lacks a standardized evaluation protocol, forcing researchers to rely on bespoke, isolated, and often simplistic environments. This fragmentation makes it difficult to assess the robustness, generalization, and failure modes of emerging methods. To address this gap, we propose a comprehensive benchmark suite for MFGs (Bench-MFG), focusing on the discrete-time, discrete-space, stationary setting for the sake of clarity. We introduce a taxonomy of problem classes, ranging from no-interaction and monotone games to potential and dynamics-coupled games, and provide prototypical environments for each. Furthermore, we propose MF-Garnets, a method for generating random MFG instances to facilitate rigorous statistical testing. We benchmark a variety of learning algorithms across these environments, including a novel black-box approach (MF-PSO) for exploitability minimization. Based on our extensive empirical results, we propose guidelines to standardize future experimental comparisons. Code available at \href{https://github.com/lorenzomagnino/Bench-MFG}{https://github.com/lorenzomagnino/Bench-MFG}.
翻译:平均场博弈(MFGs)与强化学习(RL)的交叉领域催生了越来越多旨在解决大规模多智能体系统的算法。然而,该领域目前缺乏标准化的评估协议,迫使研究人员依赖定制化、孤立且通常过于简化的环境。这种碎片化现状使得评估新兴方法的鲁棒性、泛化能力及失效模式变得困难。为弥补这一空白,我们提出了一个针对MFGs的综合基准测试套件(Bench-MFG),为求清晰性,聚焦于离散时间、离散空间的平稳设定。我们构建了从无交互博弈、单调博弈到势博弈及动态耦合博弈的问题分类体系,并为每类问题提供了原型环境。此外,我们提出了MF-Garnets——一种用于生成随机MFG实例以支持严格统计测试的方法。我们在这些环境中对多种学习算法进行了基准测试,包括一种用于最小化可利用性的新型黑盒方法(MF-PSO)。基于广泛的实证结果,我们提出了标准化未来实验比较的指导原则。代码发布于 \href{https://github.com/lorenzomagnino/Bench-MFG}{https://github.com/lorenzomagnino/Bench-MFG}。