When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game'' in this paper. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the payoff matrices. Therefore, algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies. To prove the convergence result, we find that the framework developed in stochastic approximation doesn't apply, because of the sporadic and infrequent updates of the inferior actions and the lack of Lipschitz continuity. We develop a novel sample-path-wise approach to show the convergence.
翻译:当两名玩家在未知收益矩阵的重复博弈中互动时,他们可能完全未意识到对方的存在,并使用多臂老虎机算法来选择行动,本文将此情形称为"盲选博弈"。我们证明,当玩家采用汤普森采样时,在收益矩阵满足温和假设的条件下,博弈动态会收敛至纳什均衡。因此,尽管玩家并未主动采取竞争性策略,算法合谋现象在此情况下不会出现。为证明该收敛性,我们发现随机近似理论中既有的分析框架并不适用,其原因在于劣势动作的更新具有稀疏性与非连续性,且系统缺乏利普希茨连续性。为此,我们提出了一种基于样本路径分析的全新方法以证明收敛性。