Batch normalization (BN) is a ubiquitous operation in deep neural networks, primarily used to improve stability and regularization during training. BN centers and scales feature maps using sample means and variances, which are naturally suited for Stein's shrinkage estimation. Applying such shrinkage yields more accurate mean and variance estimates of the batch in the mean-squared-error sense. In this paper, we prove that the Stein shrinkage estimator of the mean and variance dominates over the sample mean and variance estimators, respectively, in the presence of adversarial attacks modeled using sub-Gaussian distributions. Furthermore, by construction, the James-Stein (JS) BN yields a smaller local Lipschitz constant compared to the vanilla BN, implying better regularity properties and potentially improved robustness. This facilitates and justifies the application of Stein shrinkage to estimate the mean and variance parameters in BN and the use of it in image classification and segmentation tasks with and without adversarial attacks. We present SOTA performance results using this Stein-corrected BN in a standard ResNet architecture applied to the task of image classification using CIFAR-10 data, 3D CNN on PPMI (neuroimaging) data, and image segmentation using HRNet on Cityscape data with and without adversarial attacks.
翻译:批归一化(BN)是深度神经网络中普遍存在的操作,主要用于提高训练过程中的稳定性和正则化效果。BN使用样本均值和方差对特征图进行中心化和缩放,这自然适用于Stein收缩估计。应用此类收缩能够在均方误差意义上获得更准确的批次均值和方差估计。本文证明,在使用亚高斯分布建模的对抗性攻击存在时,均值和方差的Stein收缩估计量分别优于样本均值和方差估计量。此外,通过构造,James-Stein(JS)BN相比原始BN具有更小的局部Lipschitz常数,这意味着更好的正则性性质以及潜在增强的鲁棒性。这促进并证明了将Stein收缩应用于估计BN中的均值和方差参数的有效性,以及其在有/无对抗性攻击的图像分类与分割任务中的适用性。我们在标准ResNet架构中使用这种Stein校正BN,在CIFAR-10数据上进行图像分类任务,在PPMI(神经影像)数据上使用3D CNN,以及在Cityscape数据上使用HRNet进行图像分割任务(包含有/无对抗性攻击场景),均取得了当前最优的性能结果。