We study the alternating gradient descent-ascent (AltGDA) algorithm in two-player zero-sum games. Alternating methods, where players take turns to update their strategies, have long been recognized as simple and practical approaches for learning in games, exhibiting much better numerical performance than their simultaneous counterparts. However, our theoretical understanding of alternating algorithms remains limited, and results are mostly restricted to the unconstrained setting. We show that for two-player zero-sum games that admit an interior Nash equilibrium, AltGDA converges at an $O(1/T)$ ergodic convergence rate when employing a small constant stepsize. This is the first result showing that alternation improves over the simultaneous counterpart of GDA in the constrained setting. For games without an interior equilibrium, we show an $O(1/T)$ local convergence rate with a constant stepsize that is independent of any game-specific constants. In a more general setting, we develop a performance estimation programming (PEP) framework to jointly optimize the AltGDA stepsize along with its worst-case convergence rate. The PEP results indicate that AltGDA may achieve an $O(1/T)$ convergence rate for a finite horizon $T$, whereas its simultaneous counterpart appears limited to an $O(1/\sqrt{T})$ rate.
翻译:本文研究双人零和博弈中的交替梯度下降-上升算法。在交替更新方法中,参与者轮流更新其策略,长期以来被认为是博弈学习中简洁实用的方法,其数值性能显著优于同步更新算法。然而,目前对交替算法的理论认识仍然有限,相关成果主要局限于无约束场景。我们证明,对于存在内部纳什均衡的双人零和博弈,当采用较小恒定步长时,交替梯度下降-上升算法以$O(1/T)$遍历收敛速率收敛。该结果首次证明在约束场景下交替策略优于同步梯度下降-上升算法。对于不存在内部均衡的博弈,我们证明了采用与博弈特定常数无关的恒定步长时,算法具有$O(1/T)$局部收敛速率。在更一般场景中,我们构建了性能估计规划框架,以联合优化交替梯度下降-上升算法的步长及其最差收敛速率。性能估计规划结果表明,在有限时域$T$内,交替梯度下降-上升算法可能达到$O(1/T)$收敛速率,而其同步版本似乎受限于$O(1/\sqrt{T})$收敛速率。