We study the sample complexity of identifying an approximate equilibrium for two-player zero-sum $n\times 2$ matrix games. That is, in a sequence of repeated game plays, how many rounds must the two players play before reaching an approximate equilibrium (e.g., Nash)? We derive instance-dependent bounds that define an ordering over game matrices that captures the intuition that the dynamics of some games converge faster than others. Specifically, we consider a stochastic observation model such that when the two players choose actions $i$ and $j$, respectively, they both observe each other's played actions and a stochastic observation $X_{ij}$ such that $\mathbb E[ X_{ij}] = A_{ij}$. To our knowledge, our work is the first case of instance-dependent lower bounds on the number of rounds the players must play before reaching an approximate equilibrium in the sense that the number of rounds depends on the specific properties of the game matrix $A$ as well as the desired accuracy. We also prove a converse statement: there exist player strategies that achieve this lower bound.
翻译:我们研究了两玩家零和$n\times 2$矩阵博弈中识别近似均衡的样本复杂度。即,在一系列重复博弈中,两名玩家需进行多少轮次才能达到近似均衡(如纳什均衡)?我们推导了依赖实例的界,这些界定义了博弈矩阵上的排序,体现了某些博弈的动力学收敛速度高于其他博弈的直觉。具体而言,我们考虑一个随机观测模型:当两名玩家分别选择动作$i$和$j$时,他们都能观察到彼此的实际动作以及一个随机观测值$X_{ij}$,满足$\mathbb E[ X_{ij}] = A_{ij}$。据我们所知,本文首次给出了玩家达到近似均衡所需轮次的依赖实例下界——该轮次数取决于博弈矩阵$A$的具体性质以及期望精度。我们还证明了逆命题:存在玩家策略能达到该下界。