Cloud-edge-device hierarchical federated learning (HFL) has been recently proposed to achieve communication-efficient and privacy-preserving distributed learning. However, there exist several critical challenges, such as the single point of failure and potential stragglers in both edge servers and local devices. To resolve these issues, we propose a decentralized and straggler-tolerant blockchain-based HFL (BHFL) framework. Specifically, a Raft-based consortium blockchain is deployed on edge servers to provide a distributed and trusted computing environment for global model aggregation in BHFL. To mitigate the influence of stragglers on learning, we propose a novel aggregation method, HieAvg, which utilizes the historical weights of stragglers to estimate the missing submissions. Furthermore, we optimize the overall latency of BHFL by jointly considering the constraints of global model convergence and blockchain consensus delay. Theoretical analysis and experimental evaluation show that our proposed BHFL based on HieAvg can converge in the presence of stragglers, which performs better than the traditional methods even when the loss function is non-convex and the data on local devices are non-independent and identically distributed (non-IID).
翻译:云-边缘-设备分层联邦学习最近被提出,旨在实现通信高效且保护隐私的分布式学习。然而,该方法存在若干关键挑战,例如边缘服务器和本地设备中的单点故障及潜在掉队者问题。为解决这些问题,我们提出一种去中心化且容忍掉队者的基于区块链的分层联邦学习框架(BHFL)。具体而言,我们在边缘服务器上部署基于Raft的联盟链,为BHFL中的全局模型聚合提供分布式且可信的计算环境。为减轻掉队者对学习的影响,我们提出一种新型聚合方法HieAvg,该方法利用掉队者的历史权重来估计缺失的提交结果。此外,我们通过联合考虑全局模型收敛与区块链共识延迟的约束,优化了BHFL的整体延迟。理论分析与实验评估表明,我们基于HieAvg提出的BHFL能够在存在掉队者的场景下收敛,即使在损失函数非凸且本地设备数据非独立同分布时,其性能也优于传统方法。