In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on developing an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information issue. In this paper, we design and develop Asyn2F, an Asynchronous Federated learning Framework with bidirectional model aggregation. By bidirectional model aggregation, Asyn2F, on one hand, allows the server to asynchronously aggregate multiple local models and results in a new global model. On the other hand, it allows the training workers to aggregate the new version of the global model into the local model, which is being trained even in the middle of a training epoch. We develop Asyn2F considering the practical implementation requirements such as using cloud services for model storage and message queuing protocols for communications. Extensive experiments with different datasets show that the models trained by Asyn2F achieve higher performance compared to the state-of-the-art techniques. The experiments also demonstrate the effectiveness, practicality, and scalability of Asyn2F, making it ready for deployment in real scenarios.
翻译:在联邦学习中,模型可以同步或异步训练。许多研究工作专注于开发聚合方法,使服务器能够将多个本地模型聚合为性能更优的全局模型,但忽略了训练工作节点的异构性,这会导致本地模型训练延迟,进而产生信息过时问题。本文设计并开发了Asyn2F——一种具有双向模型聚合的异步联邦学习框架。通过双向模型聚合,Asyn2F一方面允许服务器异步聚合多个本地模型并生成新的全局模型,另一方面允许训练工作节点将新版本的全局模型聚合到本地模型中,即使该本地模型正处于训练轮次中间。我们充分考虑实际部署需求开发了Asyn2F,例如使用云服务进行模型存储以及采用消息队列协议进行通信。基于不同数据集的广泛实验表明,与现有最先进技术相比,Asyn2F训练的模型取得了更高的性能。实验还验证了Asyn2F的有效性、实用性和可扩展性,使其具备在实际场景中部署的可行性。