Mean Field Control Games (MFCGs) provide a powerful theoretical framework for analyzing systems of infinitely many interacting agents, blending elements from Mean Field Games (MFGs) and Mean Field Control (MFC). However, solving the coupled Hamilton-Jacobi-Bellman and Fokker-Planck equations that characterize MFCG equilibria remains a significant computational challenge, particularly in high-dimensional or complex environments. This paper presents a scalable deep Reinforcement Learning (RL) approach to approximate equilibrium solutions of MFCGs. Building on previous works, We reformulate the infinite-agent stochastic control problem as a Markov Decision Process, where each representative agent interacts with the evolving mean field distribution. We use the actor-critic based algorithm from a previous paper (Angiuli et.al., 2024) as the baseline and propose several versions of more scalable and efficient algorithms, utilizing techniques including parallel sample collection (batching); mini-batching; target network; proximal policy optimization (PPO); generalized advantage estimation (GAE); and entropy regularization. By leveraging these techniques, we effectively improved the efficiency, scalability, and training stability of the baseline algorithm. We evaluate our method on a linear-quadratic benchmark problem, where an analytical solution to the MFCG equilibrium is available. Our results show that some versions of our proposed approach achieve faster convergence and closely approximate the theoretical optimum, outperforming the baseline algorithm by an order of magnitude in sample efficiency. Our work lays the foundation for adapting deep RL to solve more complicated MFCGs closely related to real life, such as large-scale autonomous transportation systems, multi-firm economic competition, and inter-bank borrowing problems.
翻译:均值场控制博弈(MFCGs)为分析无限多个相互作用智能体系统提供了强大的理论框架,融合了均值场博弈(MFGs)与均值场控制(MFC)的核心要素。然而,求解表征MFCG均衡的耦合Hamilton-Jacobi-Bellman方程与Fokker-Planck方程仍是重大计算挑战,在高维或复杂环境中尤为突出。本文提出一种可扩展的深度强化学习方法,用于近似求解MFCGs的均衡解。基于前人研究,我们将无限智能体随机控制问题重构为马尔可夫决策过程,其中每个代表性智能体与演化的均值场分布进行交互。我们以既有研究(Angiuli等人,2024)中的演员-评论家算法为基线,通过整合并行样本收集(批处理)、小批量训练、目标网络、近端策略优化(PPO)、广义优势估计(GAE)及熵正则化等技术,提出了多个具有更高可扩展性与效率的改进算法版本。这些技术显著提升了基线算法的训练效率、可扩展性与稳定性。我们在可解析求解MFCG均衡的线性二次基准问题上评估了所提方法。实验结果表明,部分改进算法能以更快速度收敛并高度逼近理论最优解,其样本效率较基线算法提升了一个数量级。本研究为采用深度强化学习求解更复杂、更贴近现实的MFCG问题奠定了基础,例如大规模自主交通系统、多企业经济竞争及银行间借贷等问题。