Batch Normalization (BN) is widely used in {centralized} deep learning to improve convergence and generalization. However, in {federated} learning (FL) with decentralized data, prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN). In this paper, we revisit this substitution by expanding the empirical study conducted in prior work. Surprisingly, we find that BN outperforms GN in many FL settings. The exceptions are high-frequency communication and extreme non-IID regimes. We reinvestigate factors that are believed to cause this problem, including the mismatch of BN statistics across clients and the deviation of gradients during local training. We empirically identify a simple practice that could reduce the impacts of these factors while maintaining the strength of BN. Our approach, which we named FIXBN, is fairly easy to implement, without any additional training or communication costs, and performs favorably across a wide range of FL settings. We hope that our study could serve as a valuable reference for future practical usage and theoretical analysis in FL.
翻译:批归一化(BN)广泛应用于集中式深度学习中,以改善收敛性和泛化能力。然而,在数据分散的联邦学习(FL)中,先前的研究观察到使用BN会阻碍性能,并建议用组归一化(GN)替代。在本文中,我们通过扩展先前研究的实证分析重新审视了这一替代方案。令人惊讶的是,我们发现BN在许多FL设置中优于GN,例外情况是高频通信和极端非独立同分布场景。我们重新审视了被认为导致该问题的因素,包括各客户端间BN统计量的不匹配以及本地训练过程中梯度的偏差。通过实证分析,我们确定了一种简单实践,能在保持BN优势的同时减少这些因素的影响。我们将该方法命名为FIXBN,其实现相当简单,无需额外的训练或通信成本,并在广泛的FL设置中表现良好。我们希望本研究能为FL未来的实际应用和理论分析提供有价值的参考。