As a promising distributed learning paradigm, federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients. To train a large-scale DNN model, batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the generalization capability. However, recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data. While several FL algorithms have been proposed to address this issue, their performance still falls significantly when compared to the centralized scheme. Furthermore, none of them have provided a theoretical explanation of how the BN damages the FL convergence. In this paper, we present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models, which, as a result, slows down and biases the FL convergence. In view of this, we develop a new FL algorithm that is tailored to BN, called FedTAN, which is capable of achieving robust FL performance under a variety of data distributions via iterative layer-wise parameter aggregation. Comprehensive experimental results demonstrate the superiority of the proposed FedTAN over existing baselines for training BN-based DNN models.
翻译:作为一种有前景的分布式学习范式,联邦学习(FL)通过在网络边缘训练深度神经网络(DNN)模型,同时保护边缘客户端的隐私。为了训练大规模的DNN模型,批量归一化(BN)被视为一种简单有效的手段,可加速训练并提升泛化能力。然而,近期研究发现,在非独立同分布(non-i.i.d.)数据存在的情况下,BN会显著损害FL的性能。尽管已有多种FL算法被提出以解决此问题,但与集中式方案相比,它们的性能仍明显不足。此外,这些方法均未从理论上解释BN如何损害FL的收敛性。本文首次通过收敛性分析表明:在非独立同分布数据下,BN中局部统计参数与全局统计参数的不匹配会导致局部模型与全局模型之间的梯度偏差,从而减缓和偏移FL的收敛。基于此,我们开发了一种针对BN的新型FL算法FedTAN,该算法通过逐层迭代参数聚合,能够在多种数据分布下实现鲁棒的FL性能。大量实验结果表明,所提出的FedTAN在训练基于BN的DNN模型时,显著优于现有基线方法。