Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. Though numerous algorithms have been proposed and successfully applied to general practical problems, their theoretical guarantees mainly rely on certain boundedness conditions on the stochastic gradients, varying from uniform boundedness to the relaxed growth condition. In addition, how to characterize the data heterogeneity among the agents and its impacts on the algorithmic performance remains challenging. In light of such motivations, we revisit the classical Federated Averaging (FedAvg) algorithm (McMahan et al., 2017) as well as the more recent SCAFFOLD method (Karimireddy et al., 2020) for solving the distributed stochastic optimization problem and establish the convergence results under only a mild variance condition on the stochastic gradients for smooth nonconvex objective functions. Almost sure convergence to a stationary point is also established under the condition. Moreover, we discuss a more informative measurement for data heterogeneity as well as its implications.
翻译:分布式随机优化因其在大规模机器学习问题中的有效性而近年来备受关注。尽管已有大量算法被提出并成功应用于各类实际场景,但这些算法的理论保证主要依赖于随机梯度的有界性条件(从一致有界性到更宽松的增长条件)。此外,如何表征智能体间的数据异质性及其对算法性能的影响仍然是具有挑战性的问题。基于这些动机,我们重新审视了经典的联邦平均(FedAvg)算法(McMahan 等人,2017)以及更新颖的SCAFFOLD方法(Karimireddy 等人,2020),以求解分布式随机优化问题,并在非光滑非凸目标函数下仅基于随机梯度的温和方差条件建立了收敛性结论。该条件下还建立了几乎必然收敛于驻点的结论。此外,我们讨论了数据异质性的信息量更丰富的度量指标及其潜在影响。