In this paper, we consider distributed optimization problems where $n$ agents, each possessing a local cost function, collaboratively minimize the average of the local cost functions over a connected network. To solve the problem, we propose a distributed random reshuffling (D-RR) algorithm that invokes the random reshuffling (RR) update in each agent. We show that D-RR inherits favorable characteristics of RR for both smooth strongly convex and smooth nonconvex objective functions. In particular, for smooth strongly convex objective functions, D-RR achieves $\mathcal{O}(1/T^2)$ rate of convergence (where $T$ counts epoch number) in terms of the squared distance between the iterate and the global minimizer. When the objective function is assumed to be smooth nonconvex, we show that D-RR drives the squared norm of gradient to $0$ at a rate of $\mathcal{O}(1/T^{2/3})$. These convergence results match those of centralized RR (up to constant factors) and outperform the distributed stochastic gradient descent (DSGD) algorithm if we run a relatively large number of epochs. Finally, we conduct a set of numerical experiments to illustrate the efficiency of the proposed D-RR method on both strongly convex and nonconvex distributed optimization problems.
翻译:本文研究分布式优化问题,其中$n$个智能体各自拥有局部代价函数,通过通信网络协作最小化局部代价函数的平均值。为此,我们提出一种分布式随机重排(D-RR)算法,该算法在每个智能体中调用随机重排(RR)更新。我们证明了D-RR继承了RR在光滑强凸与光滑非凸目标函数中的优良特性。具体而言,对于光滑强凸目标函数,D-RR在迭代点与全局最优解的平方距离指标上达到$\mathcal{O}(1/T^2)$收敛速率(其中$T$表示轮数)。当目标函数为光滑非凸时,我们证明D-RR能使梯度平方范数以$\mathcal{O}(1/T^{2/3})$速率收敛至$0$。这些收敛结果与集中式RR(至多相差常数因子)相当,且在运行足够多轮数时优于分布式随机梯度下降(DSGD)算法。最后,我们通过数值实验验证了所提D-RR方法在强凸与非凸分布式优化问题中的有效性。