Learning Sparse Neural Networks with Identity Layers

The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.

翻译：深度神经网络的稀疏性已被广泛研究，旨在最大化性能并尽可能减少过参数化网络的规模。现有方法通常通过阈值和度量在训练过程中修剪参数。然而，不同层之间的特征相似性此前尚未得到充分讨论，而本文严格证明了该相似性与网络稀疏性高度相关。受过参数化模型中层间特征相似性的启发，我们探究了网络稀疏性与层间特征相似性之间的内在联系。具体而言，我们证明基于中心核对齐（CKA）降低层间特征相似性可通过信息瓶颈理论提升网络稀疏性。基于此理论，我们提出一种即插即用的基于CKA的稀疏正则化方法（CKA-SR），用于稀疏网络训练，该方法利用CKA减少层间特征相似性并增加网络稀疏性。换言之，我们稀疏网络的各层相对于彼此倾向于具有各自的"恒等性"。在实验中，我们将所提出的CKA-SR嵌入稀疏网络训练方法的训练过程，发现CKA-SR能够持续提升多种最先进稀疏训练方法的性能，尤其在极高稀疏度条件下表现突出。代码包含于补充材料中。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日