Federated edge learning (FEL) can training a global model from terminal nodes' local dataset, which can make full use of the computing resources of terminal nodes and performs more extensive and efficient machine learning on terminal nodes with protecting user information requirements. Performance of FEL will be suffered from long delay or fault decision as the master collects partial gradients from stragglers which cannot return correct results within a deadline. Inspired by this, in this paper, we propose a novel coded FEL to mitigate stragglers for synchronous gradient with a two-stage dynamic scheme, where we start with part of workers for a duration of before starting the second stage, and on completion of at the first stage, we start remaining workers in the second stage. In particular, the computation latency and transmission latency is essential and should be quantitatively analyzed. Then the dynamically coded coefficients scheme is proposed which is based on historical information including worker completion time. For performance optimization of FEL, a Lyapunov function is designed to maximize admission data balancing fairness and two stage dynamic coding scheme is designed to maximize arrival data among workers. Experimental evidence verifies the derived properties and demonstrates that our proposed solution achieves a better performance for practical network parameters and benchmark datasets in terms of accuracy and resource utilization in the FEL system.
翻译:联邦边缘学习(FEL)可利用终端节点的本地数据集训练全局模型,从而充分利用终端节点的计算资源,在满足用户信息保护要求的前提下,在终端节点上实现更广泛、更高效的机器学习。当主服务器从掉队节点(无法在截止时间内返回正确结果的节点)收集部分梯度时,FEL的性能将受到长延迟或错误决策的影响。受此启发,本文提出一种新颖的编码FEL方法,通过两阶段动态方案缓解同步梯度中的掉队节点问题:在第一阶段启动部分工作节点运行一段时间后,再于第二阶段开始前启动剩余工作节点。具体而言,计算延迟和传输延迟至关重要,需进行量化分析。随后,提出基于历史信息(包括工作节点完成时间)的动态编码系数方案。为优化FEL性能,设计了李雅普诺夫函数以最大化接纳数据平衡公平性,并设计了最大化工作节点间到达数据量的两阶段动态编码方案。实验验证了所推导性质,并表明在FEL系统中,所提方案在实际网络参数和基准数据集上的准确率与资源利用率方面均实现了更优性能。