Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
翻译:延迟和异步性在大规模机器学习问题中不可避免,其中通信起着关键作用。为此,已有大量研究深入分析了带延迟梯度的随机优化。然而,据我们所知,目前尚无针对极小-极大优化的类似理论,而这一课题因在对抗鲁棒性、博弈论和强化学习中的应用而近期备受关注。受这一空白驱动,我们研究了带延迟梯度更新的标准极小-极大优化算法的性能。首先,我们(通过实验)表明,即使是微小延迟也会导致诸如扩展梯度(Extra-gradient,\texttt{EG})等著名算法在简单实例上发散,而\texttt{EG}在无延迟情况下保证了收敛性。我们的实验研究因此表明,对延迟版本的极小-极大优化算法需进行仔细分析。据此,在适当的技术假设下,我们证明了带延迟更新的梯度下降上升(Gradient Descent-Ascent,\texttt{GDA})和\texttt{EG}在凸-凹和强凸-强凹设置下仍能保证收敛到鞍点。我们的复杂度界以透明的方式揭示了延迟导致的收敛速度下降。