Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks. The regularization effect of BN depends on the batch size and explicitly using smaller batch sizes with Batch Normalization, a method known as Ghost Batch Normalization (GBN), has been found to improve generalization in many settings. We investigate the effectiveness of GBN by disentangling the induced ``Ghost Noise'' from normalization and quantitatively analyzing the distribution of noise as well as its impact on model performance. Inspired by our analysis, we propose a new regularization technique called Ghost Noise Injection (GNI) that imitates the noise in GBN without incurring the detrimental train-test discrepancy effects of small batch training. We experimentally show that GNI can provide a greater generalization benefit than GBN. Ghost Noise Injection can also be beneficial in otherwise non-noisy settings such as layer-normalized networks, providing additional evidence of the usefulness of Ghost Noise in Batch Normalization as a regularizer.
翻译:批量归一化(Batch Normalization, BN)被广泛用于稳定优化过程并提升深度神经网络的测试性能。BN的正则化效果依赖于批量大小,而显式使用较小批量大小进行批量归一化的方法(即幽灵批量归一化,Ghost Batch Normalization, GBN)已在多种场景下被证明能改善模型的泛化能力。本文通过解耦归一化过程所诱导的“幽灵噪声”(Ghost Noise),并定量分析该噪声的分布及其对模型性能的影响,系统探究了GBN的有效性。受此分析启发,我们提出一种新的正则化技术——幽灵噪声注入(Ghost Noise Injection, GNI),该技术在模拟GBN中噪声行为的同时,避免了小批量训练带来的有害训练-测试差异效应。实验表明,GNI能比GBN带来更强的泛化收益。此外,在原本不含噪声的场景(如层归一化网络)中,幽灵噪声注入同样能提升性能,这为批量归一化中幽灵噪声作为正则化因子的有效性提供了额外证据。