Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to assign different pruning rates across different layers in CNN or cannot control the compression rate explicitly. Since too narrow network blocks information flow for training, automatic pruning rate setting cannot explore a high pruning rate for a specific layer. To overcome these limitations, we propose a novel framework named Layer Adaptive Progressive Pruning (LAPP), which gradually compresses the network during initial training of a few epochs from scratch. In particular, LAPP designs an effective and efficient pruning strategy that introduces a learnable threshold for each layer and FLOPs constraints for network. Guided by both task loss and FLOPs constraints, the learnable thresholds are dynamically and gradually updated to accommodate changes of importance scores during training. Therefore the pruning strategy can gradually prune the network and automatically determine the appropriate pruning rates for each layer. What's more, in order to maintain the expressive power of the pruned layer, before training starts, we introduce an additional lightweight bypass for each convolutional layer to be pruned, which only adds relatively few additional burdens. Our method demonstrates superior performance gains over previous compression methods on various datasets and backbone architectures. For example, on CIFAR-10, our method compresses ResNet-20 to 40.3% without accuracy drop. 55.6% of FLOPs of ResNet-18 are reduced with 0.21% top-1 accuracy increase and 0.40% top-5 accuracy increase on ImageNet.
翻译:结构化剪枝是卷积神经网络(CNN)中常用的压缩手段,而剪枝率设定是该领域的核心问题。现有方法大多引入过多额外可学习参数以分配不同层的剪枝率,或无法显式控制压缩率。由于过窄的网络会阻碍训练过程中的信息流,自动剪枝率设定难以针对特定层探索高剪枝率。为克服这些局限,我们提出名为层自适应渐进剪枝(LAPP)的新框架,该框架在训练初始阶段(仅需少数epoch)从头开始逐步压缩网络。具体而言,LAPP设计了一种高效剪枝策略:为每层引入可学习阈值,并加入FLOPs约束。在任务损失与FLOPs约束的共同引导下,可学习阈值随训练过程动态渐进调整,以适应重要性评分的变化。该剪枝策略能逐步修剪网络,自动确定各层合适的剪枝率。此外,为维持剪枝层的表达能力,我们在训练开始前为每个待剪卷积层引入轻量级旁路结构,仅增加极小的额外计算负担。实验表明,本方法在多种数据集和骨干架构上均优于现有压缩方法。例如,在CIFAR-10上,本方法可在准确率无损条件下将ResNet-20压缩至40.3%;在ImageNet上,ResNet-18的FLOPs减少55.6%的同时,top-1准确率提升0.21%,top-5准确率提升0.40%。