Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

from arxiv, AISTATS 2023. 65 pages, 5 figures, 3 tables. Changes in v2: new results were added (Theorem 2.5 and its corollaries), few typos were fixed, more clarifications were added. Changes in v3: AISTATS formatting was applied, small clarifications were added. Code: https://github.com/hugobb/sgda

Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. The success of the method led to several advanced extensions of the classical SGDA, including variants with arbitrary sampling, variance reduction, coordinate randomization, and distributed variants with compression, which were extensively studied in the literature, especially during the last few years. In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities. A key to our unified framework is a parametric assumption on the stochastic estimates. Via our general theoretical framework, we either recover the sharpest known rates for the known special cases or tighten them. Moreover, to illustrate the flexibility of our approach we develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA). Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs. We also demonstrate the most important properties of the new methods through extensive numerical experiments.

翻译：随机梯度下降上升法（SGDA）是解决各类机器学习任务中出现的极小极大优化与变分不等式问题（VIP）的最著名算法之一。该方法的成功催生了经典SGDA的若干高级扩展，包括具有任意采样、方差缩减、坐标随机化以及带压缩的分布式变体，这些变体在文献中得到了广泛研究，尤其是在过去几年中。本文提出了一种统一的收敛性分析，涵盖了大量随机梯度下降上升方法——此前这些方法需要不同的直觉、具有不同的应用场景，并在不同学术社群中独立发展。我们统一框架的关键在于对随机估计量施加参数化假设。通过这一通用理论框架，我们既恢复了已知特例的最优速率，又对其进行了收紧。此外，为展示方法的灵活性，我们开发了若干SGDA新变体，例如新型方差缩减方法（L-SVRGDA）、带压缩的新型分布式方法（QSGDA、DIANA-SGDA、VR-DIANA-SGDA）以及坐标随机化新方法（SEGA-SGDA）。尽管这些新方法的变体已知用于求解极小化问题，但此前从未被应用于极小极大问题与VIP的求解与分析。我们还通过大量数值实验验证了新方法最重要的性质。

相关内容