Cross-silo federated learning offers a promising solution to collaboratively train robust and generalized AI models without compromising the privacy of local datasets, e.g., healthcare, financial, as well as scientific projects that lack a centralized data facility. Nonetheless, because of the disparity of computing resources among different clients (i.e., device heterogeneity), synchronous federated learning algorithms suffer from degraded efficiency when waiting for straggler clients. Similarly, asynchronous federated learning algorithms experience degradation in the convergence rate and final model accuracy on non-identically and independently distributed (non-IID) heterogeneous datasets due to stale local models and client drift. To address these limitations in cross-silo federated learning with heterogeneous clients and data, we propose FedCompass, an innovative semi-asynchronous federated learning algorithm with a computing power-aware scheduler on the server side, which adaptively assigns varying amounts of training tasks to different clients using the knowledge of the computing power of individual clients. FedCompass ensures that multiple locally trained models from clients are received almost simultaneously as a group for aggregation, effectively reducing the staleness of local models. At the same time, the overall training process remains asynchronous, eliminating prolonged waiting periods from straggler clients. Using diverse non-IID heterogeneous distributed datasets, we demonstrate that FedCompass achieves faster convergence and higher accuracy than other asynchronous algorithms while remaining more efficient than synchronous algorithms when performing federated learning on heterogeneous clients. The source code for FedCompass is available at https://github.com/APPFL/FedCompass.
翻译:跨数据孤岛联邦学习提供了一种有前景的解决方案,可在不损害本地数据集(例如医疗、金融及缺乏集中数据设施的科研项目)隐私的前提下,协同训练鲁棒且泛化能力强的AI模型。然而,由于不同客户端之间计算资源的差异(即设备异构性),同步联邦学习算法在等待掉队客户端时会遭遇效率下降的问题。类似地,异步联邦学习算法在处理非独立同分布(non-IID)异构数据集时,会因过时模型和客户端漂移而导致收敛速度与最终模型精度降低。为应对异构客户端与数据在跨数据孤岛联邦学习中的上述局限,我们提出FedCompass——一种创新的半异步联邦学习算法,其在服务器端配备计算能力感知调度器,能够基于各客户端的计算性能知识自适应分配不同数量的训练任务。FedCompass确保来自多个客户端的本地训练模型作为一组几乎同时被接收以进行聚合,有效减少了本地模型的陈旧性。同时,整体训练过程保持异步,消除了因等待掉队客户端而产生的长时间停滞。通过多样化的非独立同分布异构分布式数据集,我们证明FedCompass在异构客户端联邦学习中实现了比异步算法更快的收敛速度和更高精度,同时仍比同步算法更具效率。FedCompass的源代码可在https://github.com/APPFL/FedCompass获取。