This paper proposes two distributed random reshuffling methods, namely Gradient Tracking with Random Reshuffling (GT-RR) and Exact Diffusion with Random Reshuffling (ED-RR), to solve the distributed optimization problem over a connected network, where a set of agents aim to minimize the average of their local cost functions. Both algorithms invoke random reshuffling (RR) update for each agent, inherit favorable characteristics of RR for minimizing smooth nonconvex objective functions, and improve the performance of previous distributed random reshuffling methods both theoretically and empirically. Specifically, both GT-RR and ED-RR achieve the convergence rate of $O(1/[(1-\lambda)^{1/3}m^{1/3}T^{2/3}])$ in driving the (minimum) expected squared norm of the gradient to zero, where $T$ denotes the number of epochs, $m$ is the sample size for each agent, and $1-\lambda$ represents the spectral gap of the mixing matrix. When the objective functions further satisfy the Polyak-{\L}ojasiewicz (PL) condition, we show GT-RR and ED-RR both achieve $O(1/[(1-\lambda)mT^2])$ convergence rate in terms of the averaged expected differences between the agents' function values and the global minimum value. Notably, both results are comparable to the convergence rates of centralized RR methods (up to constant factors depending on the network topology) and outperform those of previous distributed random reshuffling algorithms. Moreover, we support the theoretical findings with a set of numerical experiments.
翻译:本文提出了两种分布式随机重排方法,即带梯度跟踪的随机重排(GT-RR)和精确扩散随机重排(ED-RR),以解决连通网络上的分布式优化问题。在该问题中,一组智能体旨在最小化各自局部代价函数的平均值。两种算法均对每个智能体采用随机重排(RR)更新,继承了RR在最小化光滑非凸目标函数时的有利特性,并在理论和实证上提升了先前分布式随机重排方法的性能。具体而言,GT-RR和ED-RR在驱动(最小)期望梯度范数平方趋于零时,均实现了$O(1/[(1-\lambda)^{1/3}m^{1/3}T^{2/3}])$的收敛速率,其中$T$表示迭代轮数,$m$为每个智能体的样本量,$1-\lambda$代表混合矩阵的谱间隙。当目标函数进一步满足Polyak-Łojasiewicz(PL)条件时,我们证明GT-RR和ED-RR在智能体函数值与全局最小值之间的平均期望差异方面,均达到$O(1/[(1-\lambda)mT^2])$的收敛速率。值得注意的是,这两个结果均与集中式RR方法的收敛速率相当(仅相差依赖于网络拓扑的常数因子),并优于先前分布式随机重排算法。此外,我们通过一组数值实验支持了理论发现。