Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning (DSpodFL), a DFL methodology built on a generalized notion of sporadicity in both local gradient and aggregation processes. DSpodFL subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing heterogeneous and time-varying computation/communication scenarios. We analytically characterize the convergence behavior of DSpodFL for both convex and non-convex models, for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises, and show how our bounds recover existing results as special cases. Experiments demonstrate that DSpodFL consistently achieves improved training speeds compared with baselines under various system settings.
翻译:去中心化联邦学习(DFL)描述了这样一种联邦学习场景:其中(i)模型更新与(ii)模型聚合均完全由客户端执行,无需中央服务器参与。现有的DFL研究大多集中于客户端在本地模型交换之间执行固定次数本地更新的场景,忽略了通信与计算能力的异构性与动态性。在本工作中,我们提出了去中心化间歇性联邦学习(DSpodFL),这是一种建立在局部梯度与聚合过程广义间歇性概念之上的DFL方法。DSpodFL通过将每轮迭代中(i)各客户端执行梯度下降的时机与(ii)客户端对之间交换模型的行为建模为任意指示随机变量,从而将多种现有去中心化优化方法纳入一个统一的算法框架,能够刻画异构且时变的计算/通信场景。我们在通信图连通性、客户端间数据异构性以及梯度噪声的温和假设下,从理论上分析了DSpodFL对于凸模型与非凸模型、在恒定学习率与递减学习率下的收敛行为,并展示了我们的收敛界如何作为特例涵盖现有结果。实验表明,在各种系统设置下,DSpodFL相较于基线方法能持续实现更快的训练速度。