Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
翻译:延迟和异步性在通信扮演关键角色的大规模机器学习问题中不可避免。因此,多项研究已广泛分析了带延迟梯度的随机优化问题。然而,据我们所知,目前尚无针对极小极大优化的类似理论,而该主题因在对抗鲁棒性、博弈论和强化学习中的应用而近来备受关注。受此空白启发,我们研究了带延迟梯度更新的标准极小极大优化算法的性能。首先,我们(通过实验)表明,即使较小的延迟也可能导致如额外梯度算法(\texttt{EG})等知名算法在无延迟时保证收敛的简单实例上发散。因此,我们的实证研究提示需对延迟版本的极小极大优化算法进行仔细分析。相应地,在适当的技术假设下,我们证明带延迟更新的梯度下降-上升算法(\texttt{GDA})和\texttt{EG}算法在凸-凹和强凸-强凹设定下仍能保证收敛到鞍点。我们的复杂度界限以透明的方式揭示了延迟所导致的收敛速度减慢。