In the paper, we study a class of nonconvex nonconcave minimax optimization problems (i.e., $\min_x\max_y f(x,y)$), where $f(x,y)$ is possible nonconvex in $x$, and it is nonconcave and satisfies the Polyak-Lojasiewicz (PL) condition in $y$. Moreover, we propose a class of enhanced momentum-based gradient descent ascent methods (i.e., MSGDA and AdaMSGDA) to solve these stochastic Nonconvex-PL minimax problems. In particular, our AdaMSGDA algorithm can use various adaptive learning rates in updating the variables $x$ and $y$ without relying on any global and coordinate-wise adaptive learning rates. Theoretically, we present an effective convergence analysis framework for our methods. Specifically, we prove that our MSGDA and AdaMSGDA methods have the best known sample (gradient) complexity of $O(\epsilon^{-3})$ only requiring one sample at each loop in finding an $\epsilon$-stationary solution (i.e., $\mathbb{E}\|\nabla F(x)\|\leq \epsilon$, where $F(x)=\max_y f(x,y)$). This manuscript commemorates the mathematician Boris Polyak (1935-2023).
翻译:本文研究一类非凸非凹极小极大优化问题(即 $\min_x\max_y f(x,y)$),其中 $f(x,y)$ 在 $x$ 上可能非凸,而在 $y$ 上非凹且满足Polyak-Lojasiewicz(PL)条件。此外,我们提出了一类增强型基于动量的梯度上升下降方法(即MSGDA和AdaMSGDA)来解决这些随机非凸-PL极小极大问题。特别地,我们的AdaMSGDA算法在更新变量 $x$ 和 $y$ 时可使用多种自适应学习率,而无需依赖任何全局或坐标自适应的学习率。理论上,我们为所提方法提供了一个有效的收敛性分析框架。具体而言,我们证明MSGDA和AdaMSGDA方法在寻找 $\epsilon$-稳定解(即 $\mathbb{E}\|\nabla F(x)\|\leq \epsilon$,其中 $F(x)=\max_y f(x,y)$)时,仅需每个循环一个样本,即可达到目前已知最优的样本(梯度)复杂度 $O(\epsilon^{-3})$。本文谨以此纪念数学家Boris Polyak(1935-2023)。