Minimax optimization plays an important role in many machine learning tasks such as generative adversarial networks (GANs) and adversarial training. Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting where the data is distributed on multiple workers. Meanwhile, the existing decentralized minimax optimization methods rely on the strictly assumptions such as (strongly) concavity and variational inequality conditions. In the paper, thus, we propose an efficient decentralized momentum-based gradient descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax optimization, which is nonconvex in primal variable and is nonconcave in dual variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular, our DM-GDA method simultaneously uses the momentum-based techniques to update variables and estimate the stochastic gradients. Moreover, we provide a solid convergence analysis for our DM-GDA method, and prove that it obtains a near-optimal gradient complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of the nonconvex-PL stochastic minimax problems, which reaches the lower bound of nonconvex stochastic optimization. To the best of our knowledge, we first study the decentralized algorithm for Nonconvex-PL stochastic minimax optimization over a network.
翻译:极小极大优化在生成对抗网络(GANs)和对抗训练等许多机器学习任务中发挥着重要作用。尽管最近提出了多种优化方法来解决极小极大问题,但大多数方法忽略了数据分布在多个工作节点上的分布式设置。同时,现有的去中心化极小极大优化方法依赖于严格假设,如(强)凹性和变分不等式条件。因此,在本文中,我们提出了一种高效的去中心化基于动量的梯度上升下降(DM-GDA)方法,用于分布式非凸-PL极小极大优化,其中原始变量非凸,对偶变量非凹,且满足Polyak-Lojasiewicz(PL)条件。特别地,我们的DM-GDA方法同时使用基于动量的技术来更新变量和估计随机梯度。此外,我们为DM-GDA方法提供了稳健的收敛性分析,并证明它在寻找非凸-PL随机极小极大问题的$\epsilon$-驻点解时获得了近优的梯度复杂度$O(\epsilon^{-3})$,达到了非凸随机优化的下界。据我们所知,我们首次研究了网络上非凸-PL随机极小极大优化的去中心化算法。