The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.
翻译:深度神经网络的稀疏性已被广泛研究,旨在最大化性能并尽可能减少过参数化网络的规模。现有方法通常通过阈值和度量在训练过程中修剪参数。然而,不同层之间的特征相似性此前尚未得到充分讨论,而本文严格证明了该相似性与网络稀疏性高度相关。受过参数化模型中层间特征相似性的启发,我们探究了网络稀疏性与层间特征相似性之间的内在联系。具体而言,我们证明基于中心核对齐(CKA)降低层间特征相似性可通过信息瓶颈理论提升网络稀疏性。基于此理论,我们提出一种即插即用的基于CKA的稀疏正则化方法(CKA-SR),用于稀疏网络训练,该方法利用CKA减少层间特征相似性并增加网络稀疏性。换言之,我们稀疏网络的各层相对于彼此倾向于具有各自的"恒等性"。在实验中,我们将所提出的CKA-SR嵌入稀疏网络训练方法的训练过程,发现CKA-SR能够持续提升多种最先进稀疏训练方法的性能,尤其在极高稀疏度条件下表现突出。代码包含于补充材料中。