Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two parts, where clients only need to train part of the model. However, SFL still suffers from the \textit{client drift} problem when clients' data are highly non-IID. To address this issue, we propose MiniBatch-SFL. This algorithm incorporates MiniBatch SGD into SFL, where the clients train the client-side model in an FL fashion while the server trains the server-side model similar to MiniBatch SGD. We analyze the convergence of MiniBatch-SFL and show that the bound of the expected loss can be obtained by analyzing the expected server-side and client-side model updates, respectively. The server-side updates do not depend on the non-IID degree of the clients' datasets and can potentially mitigate client drift. However, the client-side model relies on the non-IID degree and can be optimized by properly choosing the cut layer. Perhaps counter-intuitive, our empirical result shows that a latter position of the cut layer leads to a smaller average gradient divergence and a better algorithm performance. Moreover, numerical results show that MiniBatch-SFL achieves higher accuracy than conventional SFL and FL. The accuracy improvement can be up to 24.1\% and 17.1\% with highly non-IID data, respectively.
翻译:联邦学习(FL)使得分布式客户端(如边缘设备)无需共享原始数据即可协同训练模型。然而,FL可能计算开销较大,因为客户端需要多次训练完整模型。拆分联邦学习(SFL)是近期提出的一种分布式方法,通过在切割层处将模型分为两部分来减轻客户端设备的计算负担,客户端仅需训练部分模型。然而,当客户端数据高度非独立同分布(non-IID)时,SFL仍面临明显的客户端漂移问题。为此,我们提出MiniBatch-SFL算法。该算法将小批量随机梯度下降(MiniBatch SGD)引入SFL,其中客户端以FL方式训练客户端侧模型,而服务器端采用类似MiniBatch SGD的方式训练服务器侧模型。我们分析了MiniBatch-SFL的收敛性,并证明可通过分别分析服务器侧和客户端侧模型的期望更新来获得期望损失的收敛界。服务器侧更新不依赖客户端数据集的非IID程度,且能潜在缓解客户端漂移。然而客户端侧模型依赖于非IID程度,可通过合理选择切割层进行优化。与直觉相反的是,实验结果表明:切割层位置越靠后,平均梯度散度越小,算法性能越好。此外,数值结果显示,相比传统SFL和FL,MiniBatch-SFL在高度非IID数据上的准确率提升分别可达24.1%和17.1%。