Repeated games consider a situation where multiple agents are motivated by their independent rewards throughout learning. In general, the dynamics of their learning become complex. Especially when their rewards compete with each other like zero-sum games, the dynamics often do not converge to their optimum, i.e., the Nash equilibrium. To tackle such complexity, many studies have understood various learning algorithms as dynamical systems and discovered qualitative insights among the algorithms. However, such studies have yet to handle multi-memory games (where agents can memorize actions they played in the past and choose their actions based on their memories), even though memorization plays a pivotal role in artificial intelligence and interpersonal relationship. This study extends two major learning algorithms in games, i.e., replicator dynamics and gradient ascent, into multi-memory games. Then, we prove their dynamics are identical. Furthermore, theoretically and experimentally, we clarify that the learning dynamics diverge from the Nash equilibrium in multi-memory zero-sum games and reach heteroclinic cycles (sojourn longer around the boundary of the strategy space), providing a fundamental advance in learning in games.
翻译:重复博弈考虑多个智能体在学习过程中受各自独立奖励驱动的情形。总体而言,其学习动力学呈现复杂性。特别是当奖励相互竞争(如零和博弈)时,动力学往往无法收敛至最优状态,即纳什均衡。为应对此类复杂性,诸多研究将各种学习算法视为动力系统,并从中发掘定性层面的洞见。然而,这些研究尚未涉及多记忆博弈(即智能体可记忆历史动作并据此选择当前动作的博弈),尽管记忆在人工智能与人际关系中至关重要。本研究将博弈中两大主流学习算法——复制子动力学与梯度上升——拓展至多记忆博弈,并证明二者动力学等价。此外,我们通过理论与实验阐明:在多记忆零和博弈中,学习动力学偏离纳什均衡,并形成异宿环(在策略空间边界附近长时间停留),这为博弈学习领域提供了基础性进展。