Learning in games considers how multiple agents maximize their own rewards through repeated games. Memory, an ability that an agent changes his/her action depending on the history of actions in previous games, is often introduced into learning to explore more clever strategies and discuss the decision-making of real agents like humans. However, such games with memory are hard to analyze because they exhibit complex phenomena like chaotic dynamics or divergence from Nash equilibrium. In particular, how asymmetry in memory capacities between agents affects learning in games is still unclear. In response, this study formulates a gradient ascent algorithm in games with asymmetry memory capacities. To obtain theoretical insights into learning dynamics, we first consider a simple case of zero-sum games. We observe complex behavior, where learning dynamics draw a heteroclinic connection from unstable fixed points to stable ones. Despite this complexity, we analyze learning dynamics and prove local convergence to these stable fixed points, i.e., the Nash equilibria. We identify the mechanism driving this convergence: an agent with a longer memory learns to exploit the other, which in turn endows the other's utility function with strict concavity. We further numerically observe such convergence in various initial strategies, action numbers, and memory lengths. This study reveals a novel phenomenon due to memory asymmetry, providing fundamental strides in learning in games and new insights into computing equilibria.
翻译:博弈学习研究多个智能体通过重复博弈最大化自身收益的过程。记忆——智能体根据历史博弈中对手行动调整自身行为的能力——常被引入学习过程,以探索更智能的策略并讨论人类等真实智能体的决策机制。然而,具有记忆的博弈因表现出混沌动力学或偏离纳什平衡等复杂现象而难以分析。特别是,智能体间记忆能力的不对称性如何影响博弈学习仍不明确。为此,本研究构建了非对称记忆容量博弈中的梯度上升算法。为获得学习动力学的理论洞见,我们首先考虑零和博弈的简化情形。观察到复杂行为:学习轨迹呈现从非稳定不动点到稳定不动点的异宿连通。尽管存在这种复杂性,我们仍对学习动力学进行分析,证明了系统局部收敛至这些稳定不动点(即纳什平衡)。我们揭示了驱动收敛的机制:长记忆智能体通过学习利用对手,进而赋予对手效用函数严格凹性。进一步通过数值实验验证了该收敛性在不同初始策略、动作数量及记忆长度下的普适性。本研究揭示了记忆不对称性导致的新现象,为博弈学习理论奠定基础,并为平衡计算提供了新见解。