Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.
翻译:基于遗憾匹配的算法,特别是遗憾匹配$^+$(RM$^+$)及其变体,是实际中求解大规模两人零和博弈最流行的方法。与诸如乐观梯度下降上升等具有强最后迭代和遍历收敛性质的零和博弈算法不同,对于遗憾匹配算法的最后迭代性质几乎一无所知。鉴于最后迭代收敛在数值优化中的重要性以及作为博弈中真实学习建模的相关性,本文研究了RM$^+$多种流行变体的最后迭代收敛性质。首先,我们通过数值实验表明,即使在一个简单的$3\times 3$博弈上,同时RM$^+$、交替RM$^+$和同时预测RM$^+$等几种实用变体均缺乏最后迭代收敛保证。随后我们证明,这些算法基于平滑技术的最新变体确实具有最后迭代收敛性:我们证明了外梯度RM$^{+}$和平滑预测RM$^+$具有渐近最后迭代收敛(无速率)和$1/\sqrt{t}$最佳迭代收敛。最后,我们引入了这些算法的重启变体,并证明它们具有线性速率的最后迭代收敛。