Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. However, despite that numerous algorithms have been proposed with empirical successes, their theoretical guarantees are restrictive and rely on certain boundedness conditions on the stochastic gradients, varying from uniform boundedness to the relaxed growth condition. In addition, how to characterize the data heterogeneity among the agents and its impacts on the algorithmic performance remains challenging. In light of such motivations, we revisit the classical FedAvg algorithm for solving the distributed stochastic optimization problem and establish the convergence results under only a mild variance condition on the stochastic gradients for smooth nonconvex objective functions. Almost sure convergence to a stationary point is also established under the condition. Moreover, we discuss a more informative measurement for data heterogeneity as well as its implications.
翻译:分布式随机优化因其在解决大规模机器学习问题中的有效性而受到广泛关注。然而,尽管已有众多算法提出并取得了经验上的成功,但其理论保证仍具有局限性,且依赖于随机梯度上的各种有界性条件,从一致有界性到松弛增长条件不等。此外,如何刻画各智能体间的数据异质性及其对算法性能的影响仍是一个挑战。基于此动机,我们重新审视了经典FedAvg算法在解决分布式随机优化问题中的应用,并仅基于随机梯度上的弱方差条件,为光滑非凸目标函数建立了收敛性结果。在该条件下,我们还证明了几乎必然收敛到驻点。此外,我们讨论了一种更具信息量的数据异质性度量方法及其含义。