Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.
翻译:联邦学习(FL)是一种去中心化的机器学习范式,其中模型在分布式设备上训练,并在中央服务器进行聚合。现有的FL框架通常假设简单的两层网络拓扑,即终端设备直接连接到聚合服务器。尽管这是一个实用的思维模型,但它未能利用现实世界分布式系统(如物联网)的固有拓扑结构。本文提出Flight,一种新颖的FL框架,支持复杂的分层多级拓扑、异步聚合,并将控制平面与数据平面解耦。我们将Flight与当前先进的FL框架Flower进行性能比较。实验结果表明,Flight的扩展性优于Flower,可支持多达2048个并发设备,并在多种模型上缩短了联邦学习完成时间。最后,我们证明Flight的分层联邦学习模型能够将通信开销降低60%以上。