Asynchronous Stochastic Gradient Descent (Asynchronous SGD) is a cornerstone method for parallelizing learning in distributed machine learning. However, its performance suffers under arbitrarily heterogeneous computation times across workers, leading to suboptimal time complexity and inefficiency as the number of workers scales. While several Asynchronous SGD variants have been proposed, recent findings by Tyurin & Richt\'arik (NeurIPS 2023) reveal that none achieve optimal time complexity, leaving a significant gap in the literature. In this paper, we propose Ringmaster ASGD, a novel Asynchronous SGD method designed to address these limitations and tame the inherent challenges of Asynchronous SGD. We establish, through rigorous theoretical analysis, that Ringmaster ASGD achieves optimal time complexity under arbitrarily heterogeneous and dynamically fluctuating worker computation times. This makes it the first Asynchronous SGD method to meet the theoretical lower bounds for time complexity in such scenarios.
翻译:异步随机梯度下降(Asynchronous SGD)是分布式机器学习中实现并行学习的核心方法。然而,在工作节点计算时间任意异构的情况下,其性能会受到影响,导致随着工作节点数量增加,时间复杂度过高且效率低下。尽管已有多种异步SGD变体被提出,但Tyurin & Richtárik(NeurIPS 2023)的最新研究揭示,这些方法均未达到最优时间复杂度,这构成了该领域文献中的一个显著空白。本文提出Ringmaster ASGD,一种新颖的异步SGD方法,旨在解决这些局限性并应对异步SGD固有的挑战。通过严格的理论分析,我们证明Ringmaster ASGD在任意异构且动态波动的工作节点计算时间下均能达到最优时间复杂度。这使其成为首个在此类场景下达到时间复杂度理论下界的异步SGD方法。