Last-iterate behaviors of learning algorithms in repeated two-player zero-sum games have been extensively studied due to their wide applications in machine learning and related tasks. Typical algorithms that exhibit the last-iterate convergence property include optimistic and extra-gradient methods. However, most existing results establish these properties under the assumption that the game is time-independent. Recently, (Feng et al, 2023) studied the last-iterate behaviors of optimistic and extra-gradient methods in games with a time-varying payoff matrix, and proved that in an unconstrained periodic game, extra-gradient method converges to the equilibrium while optimistic method diverges. This finding challenges the conventional wisdom that these two methods are expected to behave similarly as they do in time-independent games. However, compared to unconstrained games, games with constrains are more common both in practical and theoretical studies. In this paper, we investigate the last-iterate behaviors of optimistic and extra-gradient methods in the constrained periodic games, demonstrating that similar separation results for last-iterate convergence also hold in this setting.
翻译:重复二人零和博弈中学习算法的末次迭代行为因其在机器学习及相关任务中的广泛应用而受到广泛研究。具有末次迭代收敛特性的典型算法包括乐观法和额外梯度法。然而,现有结果大多基于博弈具有时间无关性的假设建立这些性质。最近,(Feng et al, 2023) 研究了具有时变收益矩阵博弈中乐观法和额外梯度法的末次迭代行为,并证明在无约束周期性博弈中,额外梯度法收敛至均衡点而乐观法则发散。这一发现挑战了传统认知——即这两种方法在时间无关博弈中表现相似,因此在时变场景中也应具有类似行为。然而,与无约束博弈相比,带约束的博弈在实践与理论研究中更为常见。本文研究了约束周期性博弈中乐观法与额外梯度法的末次迭代行为,证明末次迭代收敛的类似分离结果在此设定下同样成立。