Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varies (called a ``periodic'' game), however, the Nash equilibrium moves generically. How learning dynamics behave in such periodic games is of interest but still unclear. Interestingly, we discover that the behavior is highly dependent on the relationship between the two speeds at which the game changes and at which players learn. We observe that when these two speeds synchronize, the learning dynamics diverge, and their time-average does not converge. Otherwise, the learning dynamics draw complicated cycles, but their time-average converges. Under some assumptions introduced for the dynamical systems analysis, we prove that this behavior occurs. Furthermore, our experiments observe this behavior even if removing these assumptions. This study discovers a novel phenomenon, i.e., synchronization, and gains insight widely applicable to learning in periodic games.
翻译:零和博弈中的学习研究多智能体竞争性学习策略的情形。在此类多智能体学习中,我们常观察到策略围绕其最优解(即纳什均衡)周期性循环。然而,当博弈呈现周期性变化(称为“周期”博弈)时,纳什均衡通常会发生移动。学习动态在此类周期博弈中的行为机制值得关注,但目前尚不明确。有趣的是,我们发现该行为高度依赖于博弈变化速度与智能体学习速度之间的关联性。我们观察到,当这两个速度同步时,学习动态会发散,其时间平均值不收敛;反之,学习动态会形成复杂循环,但其时间平均值保持收敛。通过引入动力系统分析的若干假设,我们证明了该行为的必然性。此外,实验表明即使移除这些假设,该现象依然存在。本研究揭示了同步这一新现象,为理解周期博弈中的学习机制提供了具有广泛适用性的理论洞见。