This study provides a new convergence mechanism in learning in games. Learning in games considers how multiple agents maximize their own rewards through repeated plays of games. Especially in two-player zero-sum games, where agents compete with each other for their rewards, the reward of the agent depends on the opponent's strategy. Thus, a critical problem emerges when both agents learn their strategy following standard algorithms such as replicator dynamics and gradient ascent; their learning dynamics often draw cycles and cannot converge to their optimal strategies, i.e., the Nash equilibrium. We tackle this problem with a novel perspective on asymmetry in learning algorithms between the agents. We consider with-memory games where the agents can store the played actions in their memories in order to choose their subsequent actions. In such games, we focus on the asymmetry in memory capacities between the agents. Interestingly, we demonstrate that learning dynamics converge to the Nash equilibrium when the agents have different memory capacities, from theoretical and experimental aspects. Moreover, we give an interpretation of this convergence; the agent with a longer memory can use a more complex strategy, endowing the utility of the other with strict concavity.
翻译:本研究提出了一种新的博弈学习收敛机制。博弈学习研究多个智能体如何通过重复博弈最大化自身收益。特别是在两人零和博弈中,智能体相互竞争收益,智能体的收益取决于对手的策略。因此,当两个智能体都遵循复制子动力学和梯度上升等标准算法学习策略时,会出现一个关键问题:它们的学习动态常常陷入循环,无法收敛到最优策略,即纳什均衡。我们从博弈双方学习算法不对称性的新颖视角解决这一问题。我们考虑带记忆的博弈,其中智能体可以存储已执行的动作到记忆中,以便选择后续动作。在此类博弈中,我们聚焦于智能体之间记忆能力的不对称性。有趣的是,我们从理论和实验两方面证明,当智能体拥有不同的记忆容量时,学习动态会收敛到纳什均衡。此外,我们给出了这种收敛的解释:拥有更长记忆的智能体可以采用更复杂的策略,从而使对手的效用函数具有严格凹性。