Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning ($\texttt{DSpodFL}$), a DFL methodology built on a generalized notion of $\textit{sporadicity}$ in both local gradient and aggregation processes. $\texttt{DSpodFL}$ subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing $\textit{heterogeneous and time-varying}$ computation/communication scenarios. We analytically characterize the convergence behavior of $\texttt{DSpodFL}$ for both convex and non-convex models and for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises. We show how our bounds recover existing results from decentralized gradient descent as special cases. Experiments demonstrate that $\texttt{DSpodFL}$ consistently achieves improved training speeds compared with baselines under various system settings.
翻译:去中心化联邦学习(DFL)描述了这样一种联邦学习场景:其中(i)模型更新与(ii)模型聚合均完全由客户端执行,无需中央服务器参与。现有的DFL研究大多集中于客户端在本地模型交换之间执行固定次数本地更新的场景,忽略了通信与计算能力的异构性与动态性。在本工作中,我们提出去中心化间歇性联邦学习($\texttt{DSpodFL}$),这是一种建立在局部梯度与聚合过程$\textit{间歇性}$广义概念上的DFL方法。$\texttt{DSpodFL}$通过将每次迭代中(i)每个客户端执行梯度下降的发生以及(ii)客户端对之间模型交换建模为任意指示随机变量,从而将许多现有的去中心化优化方法纳入一个统一的算法框架,进而捕捉$\textit{异构且时变}$的计算/通信场景。我们在对通信图连通性、客户端间数据异构性以及梯度噪声的温和假设下,从理论上分析了$\texttt{DSpodFL}$在凸与非凸模型、恒定与递减学习率下的收敛行为。我们展示了如何从我们的界中恢复去中心化梯度下降的现有结果作为特例。实验表明,在各种系统设置下,$\texttt{DSpodFL}$相较于基线方法始终能实现更快的训练速度。