Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the design of an additional inner-loop replay buffer, the agents can effectively learn to achieve Nash equilibrium from any distribution, mitigating catastrophic forgetting. The resulting policy can be applied to various initial distributions. Numerical experiments on four canonical examples demonstrate our algorithm has better convergence properties than SOTA algorithms, in particular a DRL version of Fictitious Play for population-dependent policies.
翻译:均值场博弈(MFGs)具备处理大规模多智能体系统的能力,但学习其纳什均衡仍是一项具有挑战性的任务。本文受Munchausen强化学习与在线镜像下降方法启发,提出一种无需依赖历史数据平均或采样的深度强化学习算法,实现了种群依赖型纳什均衡。通过设计额外的内部循环经验回放缓冲区,智能体能够有效学习从任意分布中收敛至纳什均衡,缓解灾难性遗忘问题。由此产生的策略可适用于多种初始分布。在四个经典数值实验上的结果表明,该算法相比当前最优算法(特别针对种群依赖策略的深度强化学习虚拟博弈版本)具有更优的收敛特性。