To alleviate the shortage of computing power faced by clients in training deep neural networks (DNNs) using federated learning (FL), we leverage the edge computing and split learning to propose a model-splitting allowed FL (SFL) framework, with the aim to minimize the training latency without loss of test accuracy. Under the synchronized global update setting, the latency to complete a round of global training is determined by the maximum latency for the clients to complete a local training session. Therefore, the training latency minimization problem (TLMP) is modelled as a minimizing-maximum problem. To solve this mixed integer nonlinear programming problem, we first propose a regression method to fit the quantitative-relationship between the cut-layer and other parameters of an AI-model, and thus, transform the TLMP into a continuous problem. Considering that the two subproblems involved in the TLMP, namely, the cut-layer selection problem for the clients and the computing resource allocation problem for the parameter-server are relative independence, an alternate-optimization-based algorithm with polynomial time complexity is developed to obtain a high-quality solution to the TLMP. Extensive experiments are performed on a popular DNN-model EfficientNetV2 using dataset MNIST, and the results verify the validity and improved performance of the proposed SFL framework.
翻译:为缓解客户端在使用联邦学习(FL)训练深度神经网络(DNNs)时面临的计算能力不足问题,我们利用边缘计算和分割学习提出了一种允许模型切分的联邦学习(SFL)框架,目标是在不损失测试准确率的前提下最小化训练延迟。在同步全局更新设置下,完成一轮全局训练的延迟由客户端完成本地训练会话的最大延迟决定。因此,训练延迟最小化问题(TLMP)被建模为最小化最大值问题。为解决这一混合整数非线性规划问题,我们首先提出一种回归方法来拟合AI模型中切层与其他参数之间的定量关系,从而将TLMP转化为连续问题。考虑到TLMP涉及的两个子问题——即客户端的切层选择问题与参数服务器的计算资源分配问题——相对独立,我们开发了一种基于交替优化的多项式时间复杂度算法,以获得TLMP的高质量解。使用数据集MNIST在流行的DNN模型EfficientNetV2上进行了大量实验,结果验证了所提SFL框架的有效性和性能提升。