In this paper, we address the problem of computing equilibria in monotone games. The traditional Follow the Regularized Leader algorithms fail to converge to an equilibrium even in two-player zero-sum games. Although optimistic versions of these algorithms have been proposed with last-iterate convergence guarantees, they require noiseless gradient feedback. To overcome this limitation, we present a novel framework that achieves last-iterate convergence even in the presence of noise. Our key idea involves perturbing or regularizing the payoffs or utilities of the games. This perturbation serves to pull the current strategy to an anchored strategy, which we refer to as a {\it slingshot} strategy. First, we establish the convergence rates of our framework to a stationary point near an equilibrium, regardless of the presence or absence of noise. Next, we introduce an approach to periodically update the slingshot strategy with the current strategy. We interpret this approach as a proximal point method and demonstrate its last-iterate convergence. Our framework is comprehensive, incorporating existing payoff-regularized algorithms and enabling the development of new algorithms with last-iterate convergence properties. Finally, we show that our algorithms, based on this framework, empirically exhibit faster convergence.
翻译:本文研究了单调博弈中均衡计算的问题。传统的正则化跟随算法即使在两人零和博弈中也无法收敛到均衡。尽管已有乐观版本的正则化跟随算法具备最后迭代收敛保证,但它们需要无噪声的梯度反馈。为了克服这一局限,我们提出了一种新颖的框架,即使在存在噪声的情况下也能实现最后迭代收敛。我们的核心思想是对博弈的收益或效用进行扰动或正则化。这种扰动将当前策略拉向一个锚定策略,我们称之为"弹弓"策略。首先,我们建立了该框架收敛到均衡附近驻点的收敛速率,无论是否存在噪声。其次,我们引入了一种方法,定期用当前策略更新弹弓策略。我们将这种方法解释为邻近点方法,并证明了其最后迭代收敛性。我们的框架是全面的,既包含了现有的收益正则化算法,也支持开发具有最后迭代收敛性质的新算法。最后,基于此框架的算法在实验中展现了更快的收敛速度。