In this paper, we introduce an accelerated distributed stochastic gradient method with momentum for solving the distributed optimization problem, where a group of $n$ agents collaboratively minimize the average of the local objective functions over a connected network. The method, termed ``Distributed Stochastic Momentum Tracking (DSMT)'', is a single-loop algorithm that utilizes the momentum tracking technique as well as the Loopless Chebyshev Acceleration (LCA) method. We show that DSMT can asymptotically achieve comparable convergence rates as centralized stochastic gradient descent (SGD) method under a general variance condition regarding the stochastic gradients. Moreover, the number of iterations (transient times) required for DSMT to achieve such rates behaves as $\mathcal{O}(n^{5/3}/(1-\lambda))$ for minimizing general smooth objective functions, and $\mathcal{O}(\sqrt{n/(1-\lambda)})$ under the Polyak-{\L}ojasiewicz (PL) condition. Here, the term $1-\lambda$ denotes the spectral gap of the mixing matrix related to the underlying network topology. Notably, the obtained results do not rely on multiple inter-node communications or stochastic gradient accumulation per iteration, and the transient times are the shortest under the setting to the best of our knowledge.
翻译:本文提出了一种带动量的加速分布式随机梯度方法,用于求解分布式优化问题,其中一组$n$个智能体通过协作最小化连通网络上局部目标函数的平均值。该方法名为"分布式随机动量追踪(DSMT)",是一种单循环算法,利用了动量追踪技术以及无循环切比雪夫加速(LCA)方法。我们证明,在关于随机梯度的通用方差条件下,DSMT能够渐进地达到与集中式随机梯度下降(SGD)方法相当的收敛速率。此外,对于一般光滑目标函数的极小化,DSMT达到此速率所需的迭代次数(瞬态时间)为$\mathcal{O}(n^{5/3}/(1-\lambda))$;而在Polyak-Łojasiewicz(PL)条件下,该次数为$\mathcal{O}(\sqrt{n/(1-\lambda)})$。其中,$1-\lambda$表示与底层网络拓扑相关的混合矩阵的谱间隙。值得注意的是,所得结果不依赖于每次迭代的多节点间通信或随机梯度累积,且据我们所知,这些瞬态时间在现有设置中是最短的。