Regret matching (RM} -- and its modern variants -- is a foundational online algorithm that has been at the heart of many AI breakthrough results in solving benchmark zero-sum games, such as poker. Yet, surprisingly little is known so far in theory about its convergence beyond two-player zero-sum games. For example, whether regret matching converges to Nash equilibria in potential games has been an open problem for two decades. Even beyond games, one could try to use RM variants for general constrained optimization problems. Recent empirical evidence suggests that they -- particularly regret matching$^+$ (RM$^+$) -- attain strong performance on benchmark constrained optimization problems, outperforming traditional gradient descent-type algorithms. We show that alternating RM$^+$ converges to an $\epsilon$-KKT point after $O_\epsilon(1/\epsilon^4)$ iterations, establishing for the first time that it is a sound and fast first-order optimizer. Our argument relates the KKT gap to the accumulated regret, two quantities that are entirely disparate in general but interact in an intriguing way in our setting, so much so that when regrets are bounded, our complexity bound improves all the way to $O_\epsilon(1/\epsilon^2)$. From a technical standpoint, while RM$^+$ does not have the usual one-step improvement property in general, we show that it does in a certain region that the algorithm will quickly reach and remain in thereafter. In sharp contrast, our second main result establishes a lower bound: RM, with or without alternation, can take an exponential number of iterations to reach a crude approximate solution even in two-player potential games. This represents the first worst-case separation between RM and RM$^+$. Our lower bound shows that convergence to coarse correlated equilibria in potential games is exponentially faster than convergence to Nash equilibria.
翻译:遗憾匹配(RM)及其现代变体是一种基础性的在线算法,已成为解决扑克等基准零和博弈中许多人工智能突破性成果的核心。然而,令人惊讶的是,迄今为止,关于其在两人零和博弈之外的收敛性理论所知甚少。例如,遗憾匹配是否在势博弈中收敛至纳什均衡,这一开放问题已存在二十年之久。即使在博弈论之外,人们也可尝试将RM变体用于一般约束优化问题。最近的实证证据表明,特别是遗憾匹配$^+$(RM$^+$),在基准约束优化问题上表现出色,超越了传统的梯度下降类算法。我们证明了交替RM$^+$在$O_\epsilon(1/\epsilon^4)$次迭代后收敛至一个$\epsilon$-KKT点,首次确立了其作为一种可靠且快速的一阶优化器的地位。我们的论证将KKT间隙与累积遗憾联系起来,这两个量在一般情况下完全不同,但在我们的设定中以一种有趣的方式相互作用,以至于当遗憾有界时,我们的复杂度界限可提升至$O_\epsilon(1/\epsilon^2)$。从技术角度看,虽然RM$^+$通常不具备单步改进性质,但我们证明了它在算法将快速到达并随后保持的某个区域内确实具有该性质。与此形成鲜明对比的是,我们的第二个主要结果确立了一个下界:即使在两人势博弈中,无论是否采用交替策略,RM都可能需要指数级次数的迭代才能达到一个粗略的近似解。这代表了RM与RM$^+$之间首个最坏情况下的分离结果。我们的下界表明,在势博弈中收敛至粗相关均衡比收敛至纳什均衡快指数级。