We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the form of $\min_\textbf{x} \max_{\textbf{y} \in Y} f(\textbf{x}, \textbf{y})$, where the objective function $f(\textbf{x}, \textbf{y})$ is nonconvex in $\textbf{x}$ and concave in $\textbf{y}$, and the constraint set $Y \subseteq \mathbb{R}^n$ is convex and bounded. In the convex-concave setting, the single-timescale gradient descent ascent (GDA) algorithm is widely used in applications and has been shown to have strong convergence guarantees. In more general settings, however, it can fail to converge. Our contribution is to design TTGDA algorithms that are effective beyond the convex-concave setting, efficiently finding a stationary point of the function $\Phi(\cdot) := \max_{\textbf{y} \in Y} f(\cdot, \textbf{y})$. We also establish theoretical bounds on the complexity of solving both smooth and nonsmooth nonconvex-concave minimax optimization problems. To the best of our knowledge, this is the first systematic analysis of TTGDA for nonconvex minimax optimization, shedding light on its superior performance in training generative adversarial networks (GANs) and in other real-world application problems.
翻译:本文针对结构化的非凸极小极大优化问题 $\min_\textbf{x} \max_{\textbf{y} \in Y} f(\textbf{x}, \textbf{y})$ 提出了一种统一的双时间尺度梯度下降上升(TTGDA)算法分析框架,其中目标函数 $f(\textbf{x}, \textbf{y})$ 关于 $\textbf{x}$ 为非凸函数,关于 $\textbf{y}$ 为凹函数,约束集 $Y \subseteq \mathbb{R}^n$ 为有界凸集。在凸-凹设定下,单时间尺度梯度下降上升(GDA)算法已被广泛应用于实际问题,并已被证明具有强收敛性保证。然而在更一般的设定中,该算法可能无法收敛。本文的贡献在于设计了适用于超越凸-凹设定场景的TTGDA算法,该算法能有效找到函数 $\Phi(\cdot) := \max_{\textbf{y} \in Y} f(\cdot, \textbf{y})$ 的驻点。我们同时建立了光滑与非光滑非凸-凹极小极大优化问题求解复杂度的理论界。据我们所知,这是首次针对非凸极小极大优化问题的TTGDA系统分析,该研究揭示了其在生成对抗网络(GANs)训练及其他实际应用问题中优越性能的理论基础。