Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-{\L}ojasiewicz (P{\L}) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-P{\L} and primal-P{\L}-P{\L} objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a comprehensive lower bound for GDA with an arbitrary step-size ratio, which matches the full-batch upper bound for the primal-P{\L}-P{\L} case.
翻译:随机梯度下降-上升法是求解有限和极小极大优化问题的主要方法之一。大多数实际实现中,SGDA采用随机重排并顺序使用分量(即无放回采样);然而,关于极小极大算法这一方法的理论结果较少,尤其是在更易分析的(强)单调设定之外。为填补这一空白,我们研究了具有随机重排的SGDA在光滑非凸-非凹目标函数(具有Polyak-Łojasiewicz几何性质)下的收敛界。我们分析了同时更新和交替更新的SGDA-RR在非凸-PŁ和原始-PŁ-PŁ目标下的表现,并获得了比有放回SGDA更快的收敛速率。我们的速率可推广到小批量SGDA-RR,并恢复了全批量梯度下降-上升法的已知速率。最后,我们给出了任意步长比下GDA的全面下界,该下界与原始-PŁ-PŁ情况下的全批量上界相匹配。