Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates

Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.

翻译：同步联邦学习（FL）是一种用于协作边缘学习的流行范式。它通常涉及一组异构设备并行训练神经网络（NN）模型，并定期进行集中聚合。由于部分设备可能计算资源有限且可用性动态变化，FL的延迟对拖延者高度敏感。传统方法会丢弃拖延者未完成的模型内部更新、调整本地工作负载与架构，或采用异步设置——这些策略在严格的训练延迟约束下均会影响训练模型的性能。本文提出感知拖延者的分层联邦学习（SALF），该方法利用神经网络通过反向传播的优化过程，以分层方式更新全局模型。SALF允许拖延者同步传递部分梯度，使全局模型的每一层能由不同的贡献用户子集独立更新。我们进行了理论分析，在参与设备分布的温和假设下建立了全局模型的收敛性保证，揭示了SALF与无时间限制的FL具有相同的渐近收敛速率。这一理论洞见与实验观察相吻合，证明了SALF相较于缓解FL设备异质性差距的替代机制的性能优势。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日