Asynchronous federated learning aims to solve the straggler problem in heterogeneous environments, i.e., clients have small computational capacities that could cause aggregation delay. The principle of asynchronous federated learning is to allow the server to aggregate the model once it receives an update from any client rather than waiting for updates from multiple clients or waiting a specified amount of time in the synchronous mode. Due to the asynchronous setting, the stale model problem could occur, where the slow clients could utilize an outdated local model for their local data training. Consequently, when these locally trained models are uploaded to the server, they may impede the convergence of the global training. Therefore, effective model aggregation strategies play a significant role in updating the global model. Besides, client scheduling is also critical when heterogeneous clients with diversified computing capacities are participating in the federated learning process. This work first investigates the impact of the convergence of asynchronous federated learning mode when adopting the aggregation coefficient in synchronous mode. The effective aggregation solutions that can achieve the same convergence result as in the synchronous mode are then proposed, followed by an improved aggregation method with client scheduling. The simulation results in various scenarios demonstrate that the proposed algorithm converges with a similar level of accuracy as the classical synchronous federated learning algorithm but effectively accelerates the learning process, especially in its early stage.
翻译:异步联邦学习旨在解决异构环境中的落后者问题,即客户端计算能力不足可能导致聚合延迟。异步联邦学习的核心原理是允许服务器在收到任一客户端更新后立即聚合模型,而非同步模式下需等待多个客户端更新或等待指定时长。由于异步特性,可能出现模型陈旧问题:计算缓慢的客户端可能使用过时的本地模型进行训练。当这些本地训练模型上传至服务器时,可能阻碍全局训练的收敛。因此,有效的模型聚合策略对全局模型更新至关重要。此外,当具有不同计算能力的异构客户端参与联邦学习时,客户端调度同样关键。本文首先研究了在异步联邦学习模式下采用同步模式中的聚合系数对收敛性的影响,随后提出了能够实现与同步模式相同收敛效果的有效聚合方案,并进一步提出结合客户端调度的改进聚合方法。多场景仿真结果表明,所提算法在收敛精度上与传统同步联邦学习算法相当,但能有效加速学习过程,尤其在初始阶段更为显著。