Batch Normalization (BN) is commonly used in modern deep learning to improve stability and speed up convergence in centralized training. In federated learning (FL) with non-IID decentralized data, previous works observed that training with BN could hinder performance due to the mismatch of the BN statistics between training and testing. Group Normalization (GN) is thus more often used in FL as an alternative to BN. In this paper, we identify a more fundamental issue of BN in FL that makes BN inferior even with high-frequency communication between clients and servers. We then propose a frustratingly simple treatment, which significantly improves BN and makes it outperform GN across a wide range of FL settings. Along with this study, we also reveal an unreasonable behavior of BN in FL. We find it quite robust in the low-frequency communication regime where FL is commonly believed to degrade drastically. We hope that our study could serve as a valuable reference for future practical usage and theoretical analysis in FL.
翻译:批量归一化(BN)在现代深度学习中常被用于提高集中式训练的稳定性并加速收敛。在数据非独立同分布的联邦学习(FL)场景下,先前研究发现,由于BN在训练和测试之间的统计量不匹配,使用BN可能会降低性能。因此,在FL中,组归一化(GN)更常作为BN的替代方案。本文中,我们揭示了BN在FL中一个更根本的问题,该问题使得即便在客户端与服务器之间采用高频通信时,BN的性能依然较差。随后,我们提出了一种极为简单的处理方法,该方法显著改进了BN,并使其在广泛的FL设置下优于GN。与此同时,我们还发现BN在FL中呈现出一个反直觉的行为:在通常认为FL性能会大幅下降的低频通信模式下,BN的表现却相当稳健。我们希望本研究能为未来FL的实际应用和理论分析提供有价值的参考。