Federated learning (FL) is a promising approach in distributed learning keeping privacy. However, during the training pipeline of FL, slow or incapable clients (i.e., stragglers) slow down the total training time and degrade performance. System heterogeneity, including heterogeneous computing and network bandwidth, has been addressed to mitigate the impact of stragglers. Previous studies tackle the system heterogeneity by splitting a model into submodels, but with less degree-of-freedom in terms of model architecture. We propose nested federated learning (NeFL), a generalized framework that efficiently divides a model into submodels using both depthwise and widthwise scaling. NeFL is implemented by interpreting forward propagation of models as solving ordinary differential equations (ODEs) with adaptive step sizes. To address the inconsistency that arises when training multiple submodels of different architecture, we decouple a few parameters from parameters being trained for each submodel. NeFL enables resource-constrained clients to effectively join the FL pipeline and the model to be trained with a larger amount of data. Through a series of experiments, we demonstrate that NeFL leads to significant performance gains, especially for the worst-case submodel. Furthermore, we demonstrate NeFL aligns with recent studies in FL, regarding pre-trained models of FL and the statistical heterogeneity.
翻译:联邦学习(FL)是分布式学习中一种保护隐私的有效方法。然而,在FL的训练流程中,速度较慢或能力不足的客户端(即掉队者)会降低整体训练时间并影响性能。为减轻掉队者的影响,系统异构性(包括异构计算能力和网络带宽)已被纳入研究。以往研究通过将模型拆分为子模型来解决系统异构性,但在模型架构层面自由度较低。我们提出嵌套联邦学习(NeFL),这是一种通用框架,通过深度缩放和宽度缩放高效地将模型划分为子模型。NeFL将模型的前向传播解释为使用自适应步长求解常微分方程(ODE),从而实现该框架。为解决训练不同架构的多个子模型时产生的不一致性,我们为每个子模型解耦了部分参数与待训练参数。NeFL使资源受限的客户端能够有效参与FL流程,并允许模型利用更大量的数据进行训练。通过一系列实验,我们证明NeFL在性能上取得了显著提升,尤其是在最差子模型案例中。此外,我们展示了NeFL与FL领域近期研究(涉及预训练模型和统计异构性)的一致性。