Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous FL tasks could overload resource-constrained devices. In this work, we propose the first FL system to effectively coordinate and train multiple simultaneous FL tasks. We first formalize the problem of training simultaneous FL tasks. Then, we present our new approach, MAS (Merge and Split), to optimize the performance of training multiple simultaneous FL tasks. MAS starts by merging FL tasks into an all-in-one FL task with a multi-task architecture. After training for a few rounds, MAS splits the all-in-one FL task into two or more FL tasks by using the affinities among tasks measured during the all-in-one training. It then continues training each split of FL tasks based on model parameters from the all-in-one training. Extensive experiments demonstrate that MAS outperforms other methods while reducing training time by 2x and reducing energy consumption by 40%. We hope this work will inspire the community to further study and optimize training simultaneous FL tasks.
翻译:联邦学习(FL)是一种新兴的分布式机器学习方法,能够在分散的边缘设备上实现原位模型训练。然而,多个并行的FL任务可能会使资源受限的设备超负荷。在这项工作中,我们提出了首个能够有效协调并训练多个并行FL任务的联邦学习系统。我们首先形式化了并行FL任务的训练问题。接着,我们提出了新方法MAS(合并与拆分),以优化多个并行FL任务的训练性能。MAS首先将多个FL任务合并为一个具有多任务架构的综合FL任务。经过几轮训练后,MAS利用在综合训练过程中测量的任务间的亲和性,将综合FL任务拆分为两个或更多FL任务。随后,它基于综合训练中的模型参数继续训练每个拆分的FL任务。大量实验表明,MAS在减少2倍训练时间和降低40%能耗的同时,优于其他方法。我们希望这项工作能够激励社区进一步研究和优化并行FL任务的训练。