Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
翻译:延迟与异步性在通信起关键作用的大规模机器学习问题中不可避免。为此,多项研究已深入分析了含延迟梯度的随机优化问题。然而,据我们所知,目前尚缺乏针对最小-最大优化的类似理论——这一主题因在对抗鲁棒性、博弈论和强化学习中的应用而近期备受关注。受此空白启发,我们考察了含延迟梯度更新的标准最小-最大优化算法的性能。首先,我们(通过实验)表明:即使微小延迟也可能导致Extra-gradient(\texttt{EG})等知名算法在\texttt{EG}无延迟时保证收敛的简单实例上发散。因此,我们的实证研究提示需谨慎分析延迟版本的最小-最大优化算法。据此,在适当技术假设下,我们证明含延迟更新的梯度下降-上升(\texttt{GDA})和\texttt{EG}算法在凸-凹与强凸-强凹设定下仍能保证收敛至鞍点。我们的复杂度界限以清晰方式揭示了延迟导致的收敛速度下降。