We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.
翻译:我们考虑非凸-凹极小极大问题 $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$,其中 $f$ 关于 $\mathbf{x}$ 非凸但关于 $\mathbf{y}$ 凹,且 $\mathcal{Y}$ 为凸且有界集合。求解该问题最流行的算法之一是被广泛用于机器学习、控制理论和经济学中的梯度下降上升法(GDA)。尽管在凸-凹情形下已有大量收敛性结果,但在一般场景下,等步长GDA可能收敛至极限环甚至发散。本文给出了双时间尺度GDA求解非凸-凹极小极大问题的复杂度结果,证明该算法能高效找到函数 $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ 的驻点。据我们所知,这是该场景下对双时间尺度GDA的首个非渐近分析,揭示了其在训练生成对抗网络(GAN)及其他实际应用中优异性能的理论依据。