We explore the behaviour emerging from learning agents repeatedly interacting strategically for a wide range of learning dynamics, including $Q$-learning, projected gradient, replicator and log-barrier dynamics. Going beyond the better understood classes of potential games and zero-sum games, we consider the setting of a general repeated game with finite recall under different forms of monitoring. We obtain a Folk Theorem-style result and characterise the set of payoff vectors that can be obtained by these dynamics, discovering a wide range of possibilities for the emergence of algorithmic collusion. Achieving this requires a novel technical approach, which, to the best of our knowledge, yields the first convergence result for multi-agent $Q$-learning algorithms in repeated games.
翻译:我们研究了在广泛的学习动态(包括Q学习、投影梯度、复制子动态和对数障碍动态)下,学习智能体重复进行策略性互动所涌现的行为。超越已有较好理解的势博弈和零和博弈类别,我们考虑了具有有限记忆的一般重复博弈在不同监控形式下的设定。我们得到了一个民间定理风格的结果,并刻画了这些动态所能获得的支付向量集合,揭示了算法性合谋涌现的广泛可能性。实现这一结果需要一种新颖的技术方法,据我们所知,该方法首次给出了重复博弈中多智能体Q学习算法的收敛性结果。