This study examines the global behavior of dynamics in learning in games between two players, X and Y. We consider the simplest situation for memory asymmetry between two players: X memorizes the other Y's previous action and uses reactive strategies, while Y has no memory. Although this memory complicates their learning dynamics, we characterize the global behavior of such complex dynamics by discovering and analyzing two novel quantities. One is an extended Kullback-Leibler divergence from the Nash equilibrium, a well-known conserved quantity from previous studies. The other is a family of Lyapunov functions of X's reactive strategy. One of the global behaviors we capture is that if X exploits Y, then their strategies converge to the Nash equilibrium. Another is that if Y's strategy is out of equilibrium, then X becomes more exploitative with time. Consequently, we suggest global convergence to the Nash equilibrium from both aspects of theory and experiment. This study provides a novel characterization of the global behavior in learning in games through a couple of indicators.
翻译:本研究考察了两位参与者X与Y在博弈学习过程中动态的全局行为。我们考虑了两参与者间记忆不对称的最简情形:X记忆对方Y上一轮的行动并采用反应式策略,而Y不具备记忆能力。尽管这种记忆机制使学习动态复杂化,我们通过发现并分析两个新颖的量化指标,刻画了此类复杂动态的全局行为。其一是从纳什均衡出发的扩展Kullback-Leibler散度——这是源自先前研究的著名守恒量;另一指标则是关于X反应式策略的李雅普诺夫函数族。我们捕捉到的全局行为之一是:若X利用Y,则双方策略将收敛至纳什均衡。另一发现是:若Y的策略偏离均衡态,则X随时间推移会增强其利用性。由此,我们从理论与实验两方面论证了向纳什均衡的全局收敛性。本研究通过若干指标为博弈学习中的全局行为提供了新颖的特征刻画。