Cross-silo federated learning offers a promising solution to collaboratively train robust and generalized AI models without compromising the privacy of local datasets, e.g., healthcare, financial, as well as scientific projects that lack a centralized data facility. Nonetheless, because of the disparity of computing resources among different clients (i.e., device heterogeneity), synchronous federated learning algorithms suffer from degraded efficiency when waiting for straggler clients. Similarly, asynchronous federated learning algorithms experience degradation in the convergence rate and final model accuracy on non-identically and independently distributed (non-IID) heterogeneous datasets due to stale local models and client drift. To address these limitations in cross-silo federated learning with heterogeneous clients and data, we propose FedCompass, an innovative semi-asynchronous federated learning algorithm with a computing power aware scheduler on the server side, which adaptively assigns varying amounts of training tasks to different clients using the knowledge of the computing power of individual clients. FedCompass ensures that multiple locally trained models from clients are received almost simultaneously as a group for aggregation, effectively reducing the staleness of local models. At the same time, the overall training process remains asynchronous, eliminating prolonged waiting periods from straggler clients. Using diverse non-IID heterogeneous distributed datasets, we demonstrate that FedCompass achieves faster convergence and higher accuracy than other asynchronous algorithms while remaining more efficient than synchronous algorithms when performing federated learning on heterogeneous clients.
翻译:跨孤岛联邦学习为协作训练鲁棒且泛化能力强的AI模型提供了一种有前景的解决方案,同时不会损害本地数据集的隐私性,例如医疗、金融以及缺乏集中数据设施的科学项目。然而,由于不同客户端之间的计算资源差异(即设备异构性),同步联邦学习算法在等待落伍客户端时会面临效率下降的问题。同样地,异步联邦学习算法在非独立同分布(non-IID)异构数据集上,因陈旧局部模型和客户端漂移会导致收敛速度和最终模型精度下降。为解决异构客户端与数据场景下跨孤岛联邦学习的上述局限性,我们提出FedCompass——一种创新的半异步联邦学习算法,该算法在服务器端配备计算能力感知调度器,能够根据各客户端的计算能力知识自适应地分配不同规模的训练任务。FedCompass确保来自客户端的多个本地训练模型几乎同时以组形式接收并聚合,有效降低了局部模型的陈旧性。同时,整体训练过程保持异步性,消除了因等待落伍客户端而产生的长时间等待。通过使用多种非独立同分布异构分布式数据集,我们证明FedCompass在异构客户端联邦学习中,相比其他异步算法实现了更快的收敛和更高精度,同时相比同步算法保持更高效率。