Unstructured pruning is a popular compression method for efficiently reducing model parameters. However, while it effectively decreases the number of parameters, it is commonly believed that unstructured pruning cannot shorten the computational critical path, i.e., the maximum number of layers traversed during forward propagation. In this paper, we study when and how unstructured pruning can yield structural effects. For rectifier-activated networks, we introduce the notion of neuron entropy, which quantifies the degree of nonlinearity utilization. We show that magnitude-based pruning naturally lowers this entropy, sometimes down to zero-entropy layers that become linearizable and can thus be removed. Building on this insight, we propose a method that leverages "unstructured" pruning to favor sparsity in low-entropy layers, enabling their complete removal. We validate the phenomenon across CNNs, Vision Transformers, and NLP models: unstructured pruning can induce effective layer removal with little or no performance degradation in over-parameterized networks.
翻译:非结构化剪枝是一种高效减少模型参数的常用压缩方法。然而,尽管它能有效降低参数量,但普遍认为非结构化剪枝无法缩短计算关键路径,即前向传播过程中遍历的最大层数。本文研究了非结构化剪枝在何时以及如何能够产生结构性效应。针对整流器激活的网络,我们引入了神经元熵的概念,用以量化非线性利用的程度。研究表明,基于幅度的剪枝会自然降低该熵值,有时甚至会产生零熵层,这些层可被线性化从而得以移除。基于这一洞见,我们提出了一种方法,利用"非结构化"剪枝来促进低熵层的稀疏化,从而实现其完全移除。我们在CNN、Vision Transformer和NLP模型上验证了这一现象:在过参数化网络中,非结构化剪枝能够诱导有效的层移除,同时几乎不造成性能损失。