Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agents. Our analysis uses a Q-table for finite state and action spaces updated at each discrete time-step over an infinite horizon. In [Angiuli et al., 2023], we proved convergence of two-timescale algorithms for MFG and MFC separately highlighting the need to follow multiple population distributions in the MFC case. Here, we integrate this feature for MFCG as well as three rates of update decreasing to zero in the proper ratios. Our technique of proof uses a generalization to three timescales of the two-timescale analysis in [Borkar, 1997]. We give a simple example satisfying the various hypothesis made in the proof of convergence and illustrating the performance of the algorithm.
翻译:平均场控制博弈(MFCG)由[Angiuli等人,2022a]提出,描述了在群体数量与规模趋于无穷的极限下,大量大型协作型智能体群体之间的竞争博弈。本文从代表性智能体的视角出发,证明了一种三时间尺度强化Q学习(RL)算法能以无模型方式求解MFCG的收敛性。我们的分析采用有限状态与动作空间的Q表,在无限时间范围内每个离散时间步进行更新。在[Angiuli等人,2023]中,我们分别证明了平均场博弈(MFG)与平均场控制(MFC)的双时间尺度算法收敛性,并强调了MFC场景中需跟踪多个群体分布的特性。本文将此特性整合至MFCG求解框架,同时引入三种按适当比例衰减至零的更新速率。我们的证明技术将[Borkar, 1997]中的双时间尺度分析推广至三时间尺度情形。我们给出一个满足收敛性证明中各项假设的简明示例,用以阐释算法的实际性能。