Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlapping local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues in heterogeneous environments. To unleash the full potential of overlapping, we propose, FedEx, a novel \underline{fed}erated learning approach to \underline{ex}pedite FL training over mobile devices under data, computing and wireless heterogeneity. FedEx redefines the overlapping procedure with staleness ceilings to constrain memory consumption and make overlapping compatible with participation selection (PS) designs. Then, FedEx characterizes the PS utility function by considering the latency reduced by overlapping, and provides a holistic PS solution to address the straggler issue. FedEx also introduces a simple but effective metric to trigger overlapping, in order to avoid model drifts. Experimental results show that compared with its peer designs, FedEx demonstrates substantial reductions in FL training latency over heterogeneous mobile devices with limited memory cost.
翻译:在异构移动设备上,联邦学习(FL)的训练延迟对于其众多潜在应用的成功至关重要。通过革命性地将本地梯度传输与连续本地计算重叠,FL能在同构客户端上显著降低训练延迟,但在异构环境中却面临严重的模型陈旧、模型漂移、内存开销和掉队者问题。为充分释放重叠的潜力,我们提出FedEx,一种新颖的联邦学习方法,旨在数据、计算和无线网络异构条件下加速移动设备上的FL训练。FedEx通过引入陈旧性上限重新定义重叠过程,以约束内存消耗并使重叠与参与者选择(PS)设计兼容。随后,FedEx通过考虑重叠所减少的延迟来刻画PS效用函数,并提供一种整体的PS解决方案以应对掉队者问题。FedEx还引入了一种简单而有效的指标来触发重叠,以避免模型漂移。实验结果表明,与同类设计相比,FedEx在异构移动设备上以有限的内存成本实现了FL训练延迟的大幅降低。