Structured network pruning excels non-structured methods because they can take advantage of the thriving developed parallel computing techniques. In this paper, we propose a new structured pruning method. Firstly, to create more structured redundancy, we present a data-driven loss function term calculated from the correlation coefficient matrix of different feature maps in the same layer, named CCM-loss. This loss term can encourage the neural network to learn stronger linear representation relations between feature maps during the training from the scratch so that more homogenous parts can be removed later in pruning. CCM-loss provides us with another universal transcendental mathematical tool besides L*-norm regularization, which concentrates on generating zeros, to generate more redundancy but for the different genres. Furthermore, we design a matching channel selection strategy based on principal components analysis to exploit the maximum potential ability of CCM-loss. In our new strategy, we mainly focus on the consistency and integrality of the information flow in the network. Instead of empirically hard-code the retain ratio for each layer, our channel selection strategy can dynamically adjust each layer's retain ratio according to the specific circumstance of a per-trained model to push the prune ratio to the limit. Notably, on the Cifar-10 dataset, our method brings 93.64% accuracy for pruned VGG-16 with only 1.40M parameters and 49.60M FLOPs, the pruned ratios for parameters and FLOPs are 90.6% and 84.2%, respectively. For ResNet-50 trained on the ImageNet dataset, our approach achieves 42.8% and 47.3% storage and computation reductions, respectively, with an accuracy of 76.23%. Our code is available at https://github.com/Bojue-Wang/CCM-LRR.
翻译:结构化网络剪枝因能充分利用蓬勃发展的并行计算技术而优于非结构化方法。本文提出一种新的结构化剪枝方法。首先,为创建更多结构化冗余,我们提出一种数据驱动的损失函数项——基于同层不同特征图相关系数矩阵计算的CCM-loss。该损失项可激励神经网络从初始训练阶段学习特征图间更强的线性表示关系,从而在后续剪枝中移除更多同质化部分。CCM-loss为我们提供了除L*-范数正则化(专注于产生零值)之外的另一种通用先验数学工具,可针对不同类别生成更多冗余。此外,我们设计了一种基于主成分分析的通道匹配选择策略,以充分发掘CCM-loss的潜在能力。新策略主要聚焦于网络中信息流的一致性与完整性,摒弃了传统对各层保留率进行经验性硬编码的做法,能根据预训练模型的具体情况动态调整每层保留率,从而将剪枝率推向极限。值得注意的是,在Cifar-10数据集上,我们的方法使剪枝后的VGG-16在参数量仅1.40M、FLOPs仅49.60M的情况下达到93.64%的准确率,参数量和FLOPs的剪枝率分别达90.6%和84.2%。针对在ImageNet数据集上训练的ResNet-50,本方法在实现76.23%准确率的同时,存储和计算量分别降低42.8%和47.3%。我们的代码开源在 https://github.com/Bojue-Wang/CCM-LRR。