In practical distributed systems, workers are typically not homogeneous, and due to differences in hardware configurations and network conditions, can have highly varying processing times. We consider smooth nonconvex finite-sum (empirical risk minimization) problems in this setup and introduce a new parallel method, Freya PAGE, designed to handle arbitrarily heterogeneous and asynchronous computations. By being robust to "stragglers" and adaptively ignoring slow computations, Freya PAGE offers significantly improved time complexity guarantees compared to all previous methods, including Asynchronous SGD, Rennala SGD, SPIDER, and PAGE, while requiring weaker assumptions. The algorithm relies on novel generic stochastic gradient collection strategies with theoretical guarantees that can be of interest on their own, and may be used in the design of future optimization methods. Furthermore, we establish a lower bound for smooth nonconvex finite-sum problems in the asynchronous setup, providing a fundamental time complexity limit. This lower bound is tight and demonstrates the optimality of Freya PAGE in the large-scale regime, i.e., when $\sqrt{m} \geq n$, where $n$ is # of workers, and $m$ is # of data samples.
翻译:在实际分布式系统中,工作节点通常并非同构,且由于硬件配置与网络条件的差异,其处理时间可能存在高度变化。本文在此设定下研究光滑非凸有限和(经验风险最小化)问题,并提出一种新型并行方法——Freya PAGE,该方法专为处理任意异构与异步计算而设计。通过对“掉队者”的鲁棒性及自适应忽略慢速计算,Freya PAGE 相比以往所有方法(包括异步 SGD、Rennala SGD、SPIDER 和 PAGE)均提供了显著改进的时间复杂度保证,且所需假设条件更弱。该算法依赖于具有理论保证的新型通用随机梯度收集策略,这些策略本身具有独立价值,并可用于未来优化方法的设计。此外,我们在异步设定下建立了光滑非凸有限和问题的下界,给出了基础性的时间复杂度极限。该下界是紧致的,并证明了 Freya PAGE 在大规模场景(即当 $\sqrt{m} \geq n$ 时,其中 $n$ 为工作节点数,$m$ 为数据样本数)下的最优性。