Repeated games consider a situation where multiple agents are motivated by their independent rewards throughout learning. In general, the dynamics of their learning become complex. Especially when their rewards compete with each other like zero-sum games, the dynamics often do not converge to their optimum, i.e., Nash equilibrium. To tackle such complexity, many studies have understood various learning algorithms as dynamical systems and discovered qualitative insights among the algorithms. However, such studies have yet to handle multi-memory games (where agents can memorize actions they played in the past and choose their actions based on their memories), even though memorization plays a pivotal role in artificial intelligence and interpersonal relationship. This study extends two major learning algorithms in games, i.e., replicator dynamics and gradient ascent, into multi-memory games. Then, we prove their dynamics are identical. Furthermore, theoretically and experimentally, we clarify that the learning dynamics diverge from the Nash equilibrium in multi-memory zero-sum games and reach heteroclinic cycles (sojourn longer around the boundary of the strategy space), providing a fundamental advance in learning in games.
翻译:重复博弈考虑多个智能体在独立奖励驱动下进行学习的情境。一般而言,其学习动力学变得复杂,特别是当奖励相互竞争(如零和博弈)时,动力学往往不会收敛到最优状态即纳什均衡。为应对这种复杂性,许多研究将各类学习算法理解为动力系统,并发现了算法间的定性洞见。然而,这些研究尚未能处理多记忆博弈——其中智能体可记忆自身过去采取的行动,并基于记忆选择行动——尽管记忆在人工智能与人际关系中起着关键作用。本研究将博弈中的两种主要学习算法,即复制动力学和梯度上升法,扩展至多记忆博弈,并证明二者的动力学等价。此外,通过理论与实验研究,我们阐明在多记忆零和博弈中学习动力学偏离纳什均衡,并形成异宿环(即在策略空间边界附近停留更久),这为博弈学习领域提供了基础性进展。