Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining parameters in pruned networks inevitably bring a great challenge to fine-tuning to restore accuracy. To address this challenge, we propose a novel method that first linearly over-parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters and then re-parameterizes them to the original layers after fine-tuning. Specifically, we equivalently expand the convolution/linear layer with several consecutive convolution/linear layers that do not alter the current output feature maps. Furthermore, we utilize similarity-preserving knowledge distillation that encourages the over-parameterized block to learn the immediate data-to-data similarities of the corresponding dense layer to maintain its feature learning ability. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet which significantly outperforms the vanilla fine-tuning strategy, especially for large pruning ratio.
翻译:结构化剪枝通过减少通道(滤波器)来压缩神经网络,以实现运行时快速推理和低内存占用。为恢复剪枝后的准确性,通常对剪枝网络进行微调。然而,剪枝网络中过少的剩余参数不可避免地给微调恢复准确性带来巨大挑战。为解决这一问题,我们提出了一种新方法:首先对剪枝网络中的紧凑层进行线性过参数化以增加微调参数数量,然后在微调后将其重新参数化为原始层。具体而言,我们将卷积/线性层等价扩展为多个连续的卷积/线性层,且不改变当前输出特征图。此外,我们利用保留相似性的知识蒸馏方法,促使过参数化模块学习对应密集层的即时数据间相似性,以维持其特征学习能力。所提方法在CIFAR-10和ImageNet上进行了全面评估,显著优于原始微调策略,尤其在大剪枝比例下的表现更为突出。