Batch normalization (BatchNorm) is a popular layer normalization technique used when training deep neural networks. It has been shown to enhance the training speed and accuracy of deep learning models. However, the mechanics by which BatchNorm achieves these benefits is an active area of research, and different perspectives have been proposed. In this paper, we investigate the effect of BatchNorm on the resulting hidden representations, that is, the vectors of activation values formed as samples are processed at each hidden layer. Specifically, we consider the sparsity of these representations, as well as their implicit clustering -- the creation of groups of representations that are similar to some extent. We contrast image classification models trained with and without batch normalization and highlight consistent differences observed. These findings highlight that BatchNorm's effect on representational sparsity is not a significant factor affecting generalization, while the representations of models trained with BatchNorm tend to show more advantageous clustering characteristics.
翻译:批归一化(BatchNorm)是一种在训练深度神经网络时常用的层归一化技术。已有研究表明,它能提升深度学习模型的训练速度与精度。然而,批归一化实现这些优势的具体机制仍是当前研究的热点领域,且已有多种不同观点被提出。本文研究了批归一化对所得隐藏表征——即样本在各隐藏层处理过程中形成的激活值向量——的影响。具体而言,我们考察了这些表征的稀疏性及其隐含的聚类特性,即在一定程度上相似的若干表征所形成的群组。我们对比了使用与不使用批归一化训练的图像分类模型,并强调了观察到的系统性差异。这些发现表明,批归一化对表征稀疏性的影响并非影响泛化性能的关键因素,而使用批归一化训练的模型其表征往往展现出更具优势的聚类特性。