Learning in games considers how multiple agents maximize their own rewards through repeated games. Memory, an ability that an agent changes his/her action depending on the history of actions in previous games, is often introduced into learning to explore more clever strategies and discuss the decision-making of real agents like humans. However, such games with memory are hard to analyze because they exhibit complex phenomena like chaotic dynamics or divergence from Nash equilibrium. In particular, how asymmetry in memory capacities between agents affects learning in games is still unclear. In response, this study formulates a gradient ascent algorithm in games with asymmetry memory capacities. To obtain theoretical insights into learning dynamics, we first consider a simple case of zero-sum games. We observe complex behavior, where learning dynamics draw a heteroclinic connection from unstable fixed points to stable ones. Despite this complexity, we analyze learning dynamics and prove local convergence to these stable fixed points, i.e., the Nash equilibria. We identify the mechanism driving this convergence: an agent with a longer memory learns to exploit the other, which in turn endows the other's utility function with strict concavity. We further numerically observe such convergence in various initial strategies, action numbers, and memory lengths. This study reveals a novel phenomenon due to memory asymmetry, providing fundamental strides in learning in games and new insights into computing equilibria.
翻译:博弈学习研究多个智能体如何通过重复博弈最大化自身收益。记忆作为智能体根据历史博弈行动调整自身行为的能力,常被引入学习过程以探索更优策略并讨论人类等真实智能体的决策机制。然而,具有记忆的博弈因会出现混沌动力学或偏离纳什均衡等复杂现象而难以分析。特别地,智能体间记忆容量的不对称性如何影响博弈学习仍不明确。为此,本研究构建了具有不对称记忆容量的博弈梯度上升算法。为获得学习动力学的理论洞见,我们首先考虑零和博弈的简单情形。观察到复杂行为:学习动力学形成从非稳定不动点到稳定不动点的异宿连接。尽管存在这种复杂性,我们仍对其学习动力学进行分析,并证明了这些稳定不动点(即纳什均衡)的局部收敛性。揭示了驱动收敛的机制:记忆更长的智能体学会利用对方,从而使对方的效用函数具有严格凹性。进一步通过数值实验验证了不同初始策略、动作数量及记忆长度下的收敛性。本研究揭示了记忆不对称引发的新现象,为博弈学习领域提供了基础性进展,并为均衡计算带来了新见解。