This paper considers repeated games in which one player has more information about the game than the other players. In particular, we investigate repeated two-player zero-sum games where only the column player knows the payoff matrix A of the game. Suppose that while repeatedly playing this game, the row player chooses her strategy at each round by using a no-regret algorithm to minimize her (pseudo) regret. We develop a no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium. We show that our algorithm is efficient against a large set of popular no-regret algorithms of the row player, including the multiplicative weight update algorithm, the online mirror descent method/follow-the-regularized-leader, the linear multiplicative weight update algorithm, and the optimistic multiplicative weight update.
翻译:本文研究了一类重复博弈,其中一方玩家比其他玩家拥有更多关于博弈的信息。特别地,我们探究了仅列玩家知晓博弈收益矩阵A的二人零和重复博弈。假设在重复进行该博弈时,行玩家每轮采用无遗憾算法选择策略以最小化其(伪)遗憾。我们为列玩家设计了一种无瞬时遗憾算法,使其在末轮收敛至极小极大均衡。研究表明,我们的算法能有效应对行玩家采用的多类主流无遗憾算法,包括乘法权重更新算法、在线镜像下降/跟随正则化领导者方法、线性乘法权重更新算法以及乐观乘法权重更新算法。