This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm \textit{Mean Field Proximal Policy Optimization (MF-PPO)}, and we empirically show the effectiveness of our method in the OpenSpiel framework.
翻译:本研究探讨非合作多智能体强化学习(MARL),其中多个智能体在同一环境中交互,目标是个体回报最大化。随着智能体数量增加,大量智能体引入的非平稳性带来了挑战。为解决此问题,平均场博弈(MFG)利用对称性和同质性假设,近似处理大规模群体博弈。近年来,深度强化学习被用于将MFG扩展到具有更大状态空间的博弈中。当前方法依赖于平滑技术,例如对Q值或平均场分布更新进行平均。本文提出一种基于平均场策略近端更新的差异化方法以稳定学习。我们将算法命名为平均场近端策略优化(MF-PPO),并在OpenSpiel框架中通过实验验证了该方法的有效性。