As a promising distributed learning paradigm, federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients. To train a large-scale DNN model, batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the generalization capability. However, recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data. While several FL algorithms have been proposed to address this issue, their performance still falls significantly when compared to the centralized scheme. Furthermore, none of them have provided a theoretical explanation of how the BN damages the FL convergence. In this paper, we present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models, which, as a result, slows down and biases the FL convergence. In view of this, we develop a new FL algorithm that is tailored to BN, called FedTAN, which is capable of achieving robust FL performance under a variety of data distributions via iterative layer-wise parameter aggregation. Comprehensive experimental results demonstrate the superiority of the proposed FedTAN over existing baselines for training BN-based DNN models.
翻译:作为一种有前景的分布式学习范式,联邦学习(FL)在保护边缘客户端隐私的同时,可在网络边缘训练深度神经网络(DNN)模型。为训练大规模DNN模型,批量归一化(BN)被视为一种简单有效的方法,可加速训练并提升泛化能力。然而,近期研究发现,在非独立同分布(non-i.i.d.)数据存在时,BN会显著损害FL性能。尽管已有多个FL算法被提出以解决该问题,但其性能仍远低于集中式方案,且均未从理论上解释BN如何破坏FL收敛性。本文首次通过收敛性分析证明:在非独立同分布数据下,BN中局部与全局统计参数的失配会导致局部模型与全局模型间的梯度偏差,从而减缓并偏移FL的收敛过程。基于此,我们开发了一种专为BN定制的FL算法——FedTAN,该算法通过迭代式逐层参数聚合,能在多种数据分布下实现稳健的FL性能。综合实验结果表明,所提出的FedTAN在训练基于BN的DNN模型时,性能显著优于现有基线方法。