Federated Learning (FL) is a well-established technique for privacy preserving distributed training. Much attention has been given to various aspects of FL training. A growing number of applications that consume FL-trained models, however, increasingly operate under dynamically and unpredictably variable conditions, rendering a single model insufficient. We argue for training a global family of models cost efficiently in a federated fashion. Training them independently for different tradeoff points incurs $O(k)$ cost for any k architectures of interest, however. Straightforward applications of FL techniques to recent weight-shared training approaches is either infeasible or prohibitively expensive. We propose SuperFed - an architectural framework that incurs $O(1)$ cost to co-train a large family of models in a federated fashion by leveraging weight-shared learning. We achieve an order of magnitude cost savings on both communication and computation by proposing two novel training mechanisms: (a) distribution of weight-shared models to federated clients, (b) central aggregation of arbitrarily overlapping weight-shared model parameters. The combination of these mechanisms is shown to reach an order of magnitude (9.43x) reduction in computation and communication cost for training a $5*10^{18}$-sized family of models, compared to independently training as few as $k = 9$ DNNs without any accuracy loss.
翻译:摘要:联邦学习(FL)是一种成熟的隐私保护分布式训练技术。已有大量研究关注FL训练的不同方面。然而,越来越多使用FL训练模型的应用场景,需要在动态且不可预测的可变条件下运行,这使得单一模型难以满足需求。我们认为应通过联邦方式高效地训练一个全局模型族。然而,针对不同权衡点独立训练模型家族中的每个架构,将产生$O(k)$的成本(其中$k$为感兴趣的架构数量)。将联邦学习技术直接应用于近年提出的权重共享训练方法,要么不可行,要么代价过高。为此,我们提出SuperFed——一种架构框架,通过利用权重共享学习,以$O(1)$的成本在联邦方式下协同训练大规模模型族。通过提出两种新颖的训练机制,我们在通信和计算上实现了数量级的成本节省:(a)向联邦客户端分发权重共享模型;(b)对任意重叠的权重共享模型参数进行中心化聚合。实验证明,与独立训练仅$k=9$个深度神经网络且不损失精度相比,这两种机制的结合可将训练规模为$5*10^{18}$的模型族的计算与通信成本降低一个数量级(9.43倍)。