This study examines the global behavior of dynamics in learning in games between two players, X and Y. We consider the simplest situation for memory asymmetry between two players: X memorizes the other Y's previous action and uses reactive strategies, while Y has no memory. Although this memory complicates the learning dynamics, we discover two novel quantities that characterize the global behavior of such complex dynamics. One is an extended Kullback-Leibler divergence from the Nash equilibrium, a well-known conserved quantity from previous studies. The other is a family of Lyapunov functions of X's reactive strategy. These two quantities capture the global behavior in which X's strategy becomes more exploitative, and the exploited Y's strategy converges to the Nash equilibrium. Indeed, we theoretically prove that Y's strategy globally converges to the Nash equilibrium in the simplest game equipped with an equilibrium in the interior of strategy spaces. Furthermore, our experiments also suggest that this global convergence is universal for more advanced zero-sum games than the simplest game. This study provides a novel characterization of the global behavior of learning in games through a couple of indicators.
翻译:本研究探讨了两位参与者X与Y在博弈学习过程中的动力学全局行为。我们考虑了两参与者间记忆不对称的最简情形:X记忆对方Y上一轮行动并采用反应式策略,而Y不具备记忆能力。尽管这种记忆机制使学习动力学复杂化,但我们发现了两个能够刻画此类复杂动力学全局行为的新颖物理量。其一是从纳什均衡出发的扩展Kullback-Leibler散度——这是源自先前研究的著名守恒量;另一组则是关于X反应式策略的Lyapunov函数族。这两个物理量共同揭示了X策略趋向更具剥削性、而被剥削方Y策略收敛至纳什均衡的全局行为。我们通过理论严格证明了在策略空间内部存在均衡的最简博弈中,Y策略具有全局收敛至纳什均衡的特性。此外,实验结果表明这种全局收敛性在比最简博弈更复杂的零和博弈中具有普适性。本研究通过若干指标为博弈学习动力学提供了全新的全局行为刻画方法。