Many recent works have shown trainability plays a central role in neural network pruning -- unattended broken trainability can lead to severe under-performance and unintentionally amplify the effect of retraining learning rate, resulting in biased (or even misinterpreted) benchmark results. This paper introduces trainability preserving pruning (TPP), a scalable method to preserve network trainability against pruning, aiming for improved pruning performance and being more robust to retraining hyper-parameters (e.g., learning rate). Specifically, we propose to penalize the gram matrix of convolutional filters to decorrelate the pruned filters from the retained filters. In addition to the convolutional layers, per the spirit of preserving the trainability of the whole network, we also propose to regularize the batch normalization parameters (scale and bias). Empirical studies on linear MLP networks show that TPP can perform on par with the oracle trainability recovery scheme. On nonlinear ConvNets (ResNet56/VGG19) on CIFAR10/100, TPP outperforms the other counterpart approaches by an obvious margin. Moreover, results on ImageNet-1K with ResNets suggest that TPP consistently performs more favorably against other top-performing structured pruning approaches. Code: https://github.com/MingSun-Tse/TPP.
翻译:近期许多研究表明,可训练性在神经网络剪枝中起到核心作用——未受关注的受损可训练性可能导致严重性能下降,并意外放大重训练学习率的影响,从而产生偏差(甚至被误解)的基准测试结果。本文提出"保持可训练性剪枝"(TPP),这是一种可扩展的方法,旨在保留网络对剪枝的可训练性,以改善剪枝性能并提高对重训练超参数(如学习率)的鲁棒性。具体而言,我们提出对卷积滤波器的格拉姆矩阵进行惩罚,以解耦被剪枝滤波器与保留滤波器。除卷积层外,遵循保持整个网络可训练性的理念,我们还提出对批量归一化参数(缩放因子和偏置)进行正则化。在线性多层感知机网络上的实证研究表明,TPP可达到与理想可训练性恢复方案相当的性能。在CIFAR10/100数据集上的非线性卷积网络(ResNet56/VGG19)中,TPP以明显优势超越其他对比方法。此外,在ImageNet-1K数据集上使用ResNet的实验表明,TPP始终优于其他顶级结构化剪枝方法。代码地址:https://github.com/MingSun-Tse/TPP