In this paper, we analyze batch normalization from the perspective of discriminability and find the disadvantages ignored by previous studies: the difference in $l_2$ norms of sample features can hinder batch normalization from obtaining more distinguished inter-class features and more compact intra-class features. To address this issue, we propose a simple yet effective method to equalize the $l_2$ norms of sample features. Concretely, we $l_2$-normalize each sample feature before feeding them into batch normalization, and therefore the features are of the same magnitude. Since the proposed method combines the $l_2$ normalization and batch normalization, we name our method $L_2$BN. The $L_2$BN can strengthen the compactness of intra-class features and enlarge the discrepancy of inter-class features. The $L_2$BN is easy to implement and can exert its effect without any additional parameters or hyper-parameters. We evaluate the effectiveness of $L_2$BN through extensive experiments with various models on image classification and acoustic scene classification tasks. The results demonstrate that the $L_2$BN can boost the generalization ability of various neural network models and achieve considerable performance improvements.
翻译:本文从判别性的角度分析了批归一化,并发现了先前研究忽视的缺点:样本特征在l2范数上的差异会阻碍批归一化获得更具区分性的类间特征和更紧凑的类内特征。为解决这一问题,我们提出了一种简单而有效的方法来均衡样本特征的l2范数。具体而言,我们在将每个样本特征输入批归一化之前对其进行l2归一化,从而使特征具有相同的量级。由于该方法结合了l2归一化与批归一化,我们将其命名为$L_2$BN。$L_2$BN能够增强类内特征的紧凑性并扩大类间特征的差异性。该方法易于实现,且无需任何额外参数或超参数即可发挥作用。我们通过在图像分类和声学场景分类任务上使用多种模型进行大量实验,评估了$L_2$BN的有效性。结果表明,$L_2$BN能够提升各类神经网络模型的泛化能力,并实现显著的性能提升。