Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. Though numerous algorithms have been proposed and successfully applied to general practical problems, their theoretical guarantees mainly rely on certain boundedness conditions on the stochastic gradients, varying from uniform boundedness to the relaxed growth condition. In addition, how to characterize the data heterogeneity among the agents and its impacts on the algorithmic performance remains challenging. In light of such motivations, we revisit the classical Federated Averaging (FedAvg) algorithm for solving the distributed stochastic optimization problem and establish the convergence results under only a mild variance condition on the stochastic gradients for smooth nonconvex objective functions. Almost sure convergence to a stationary point is also established under the condition. Moreover, we discuss a more informative measurement for data heterogeneity as well as its implications.
翻译:分布式随机优化因其在解决大规模机器学习问题中的有效性而备受关注。尽管已有众多算法被提出并成功应用于一般实际问题,但其理论保证主要依赖于随机梯度的某些有界性条件,这些条件从一致有界性到松弛增长条件不等。此外,如何刻画智能体之间的数据异质性及其对算法性能的影响仍具挑战性。基于上述动机,我们重新审视了经典的联邦平均(FedAvg)算法以求解分布式随机优化问题,并仅基于随机梯度在光滑非凸目标函数下的温和方差条件建立了收敛结果。在此条件下,还证明了算法几乎必然收敛到稳定点。此外,我们讨论了更具信息量的数据异质性度量及其意义。