In distributed stochastic optimization, where parallel and asynchronous methods are employed, we establish optimal time complexities under virtually any computation behavior of workers/devices/CPUs/GPUs, capturing potential disconnections due to hardware and network delays, time-varying computation powers, and any possible fluctuations and trends of computation speeds. These real-world scenarios are formalized by our new universal computation model. Leveraging this model and new proof techniques, we discover tight lower bounds that apply to virtually all synchronous and asynchronous methods, including Minibatch SGD, Asynchronous SGD (Recht et al., 2011), and Picky SGD (Cohen et al., 2021). We show that these lower bounds, up to constant factors, are matched by the optimal Rennala SGD and Malenia SGD methods (Tyurin & Richt\'arik, 2023).
翻译:在采用并行与异步方法的分布式随机优化中,我们建立了在几乎所有工作节点/设备/CPU/GPU的计算行为下的最优时间复杂性,涵盖了因硬件与网络延迟、时变计算能力以及计算速度任何可能的波动与趋势所导致的潜在断连。这些现实场景通过我们提出的新型通用计算模型得以形式化。借助该模型及新的证明技术,我们发现了适用于几乎所有同步与异步方法的紧下界,包括小批量随机梯度下降(Minibatch SGD)、异步随机梯度下降(Asynchronous SGD, Recht et al., 2011)以及挑剔随机梯度下降(Picky SGD, Cohen et al., 2021)。我们证明这些下界(在常数因子范围内)可由最优的Rennala SGD与Malenia SGD方法(Tyurin & Richt\'arik, 2023)匹配实现。