Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities

The efficiency of Federated Learning (FL) is often affected by both data and device heterogeneities. Data heterogeneity is defined as the heterogeneity of data distributions on different clients. Device heterogeneity is defined as the clients' variant latencies in uploading their local model updates due to heterogeneous conditions of local hardware resources, and causes the problem of staleness when being addressed by asynchronous FL. Traditional schemes of tackling the impact of staleness consider data and device heterogeneities as two separate and independent aspects in FL, but this assumption is unrealistic in many practical FL scenarios where data and device heterogeneities are intertwined. In these cases, traditional schemes of weighted aggregation in FL have been proved to be ineffective, and a better approach is to convert a stale model update into a non-stale one. In this paper, we present a new FL framework that leverages the gradient inversion technique for such conversion, hence efficiently tackling unlimited staleness in clients' model updates. Our basic idea is to use gradient inversion to get estimations of clients' local training data from their uploaded stale model updates, and use these estimations to compute non-stale client model updates. In this way, we address the problem of possible data quality drop when using gradient inversion, while still preserving the clients' local data privacy. We compared our approach with the existing FL strategies on mainstream datasets and models, and experiment results demonstrate that when tackling unlimited staleness, our approach can significantly improve the trained model accuracy by up to 20% and speed up the FL training progress by up to 35%.

翻译：联邦学习（FL）的效率常受数据异构性与设备异构性的共同影响。数据异构性定义为不同客户端上数据分布的差异性。设备异构性则定义为由于本地硬件资源条件不同，客户端上传其本地模型更新的延迟存在差异，并在异步联邦学习处理时引发陈旧性问题。传统应对陈旧性影响的方案将数据与设备异构性视为联邦学习中两个独立且无关的方面，但这一假设在许多实际联邦学习场景中并不成立，因为数据与设备异构性往往相互交织。在这些情况下，传统联邦学习中的加权聚合方案已被证明效果有限，更好的方法是将陈旧的模型更新转换为非陈旧更新。本文提出一种新型联邦学习框架，该框架利用梯度反演技术实现此类转换，从而有效应对客户端模型更新中的无限陈旧性问题。我们的核心思想是：通过梯度反演从客户端上传的陈旧模型更新中估算其本地训练数据，并利用这些估算值计算非陈旧的客户端模型更新。通过这种方式，我们在解决使用梯度反演可能导致数据质量下降问题的同时，仍能保护客户端的本地数据隐私。我们在主流数据集和模型上将本方法与现有联邦学习策略进行比较，实验结果表明：在处理无限陈旧性时，本方法可将训练模型准确率最高提升20%，并将联邦学习训练进程最高加速35%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日