We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.
翻译:我们研究零和矩阵博弈中最小最大策略的学习问题。Fiegel等人(2025)最近证明,在该场景下,当玩家非耦合时,实现最后迭代收敛更为困难,并给出了可剥削性差距的Omega(t^{-1/4})下界。文献中已提出一些在线镜像下降算法来解决此问题,但目前尚未有算法真正达到该速率。我们证明,使用对数障碍正则化结合双重焦点分析,能够以高概率实现O-tilde(t^{-1/4})收敛。此外,我们将我们的思想扩展到扩展形式博弈场景,并证明了相同速率的界。