Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $\delta>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/\delta$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.
翻译:在理论与实践中,通过在线学习进行自我对弈是求解大规模双人零和博弈的主要方法之一。特别流行的算法包括乐观乘性权重更新(OMWU)与乐观梯度下降上升法(OGDA)。尽管这两种算法在双人零和博弈中均享有$O(1/T)$的遍历收敛性以逼近纳什均衡,但OMWU具有若干优势,包括对收益矩阵大小的对数依赖性,以及即使在一般和博弈中也能以$\widetilde{O}(1/T)$速率收敛至粗相关均衡。然而,就双人零和博弈中的末次迭代收敛性(该领域日益热门的研究方向)而言,OGDA能保证对偶间隙以$O(1/\sqrt{T})$速率收缩,而现有OMWU的最佳末次迭代收敛界依赖于某些可能任意大的博弈相关常数。这引出一个问题:这种潜在的缓慢末次迭代收敛究竟是OMWU的固有缺陷,还是当前分析过于宽松?令人惊讶的是,我们证明前者成立。更一般地,我们证明一大类未能快速遗忘历史信息的算法均存在相同问题:对于任意小的$\delta>0$,总存在一个$2\times 2$矩阵博弈,使得该算法即使经过$1/\delta$轮迭代后仍保持恒定的对偶间隙。此类算法包括OMWU及其他标准的乐观跟随正则化领导者算法。