Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline significantly increases the overall training time. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation, which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art MLP-Mixer, Swin Transformer, and VGG-16, ResNet-18, ResNet-50 and ResNet-110, MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, KITTI benchmark, and TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy $3\times$ faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges $2\times$ faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hard-pruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model's accuracy.

翻译：网络剪枝是广泛应用于深度神经网络以降低计算成本和模型尺寸的技术。然而，典型的三阶段流程显著增加了整体训练时间。本文基于代理拉格朗日松弛（Surrogate Lagrangian relaxation, SLR）开发了一种系统化的权重剪枝优化方法，该方法专门设计用于克服权重剪枝问题的离散性带来的困难。我们证明了该方法能确保模型压缩问题的快速收敛，且通过二次惩罚项加速了SLR的收敛性。与当前最先进方法相比，SLR在训练阶段获得的模型参数更接近其最优值。我们使用CIFAR-10和ImageNet数据集，结合最先进的MLP-Mixer、Swin Transformer、VGG-16、ResNet-18、ResNet-50、ResNet-110和MobileNetV2模型，在图像分类任务上评估了该方法。同时，我们还利用多种模型在COCO、KITTI基准测试和TuSimple车道检测数据集上评估了目标检测与分割任务。实验结果表明，在相同精度要求下，基于SLR的权重剪枝优化方法相比最先进方法实现了更高的压缩率；在相同压缩率要求下，该方法也能达到更高精度。在分类任务中，SLR方法在两个数据集上均以3倍速度收敛至目标精度；在目标检测与分割任务中，SLR以2倍速度收敛至目标精度。此外，即使在硬剪枝阶段无需重训练，SLR仍能保持较高模型精度，从而将传统三阶段剪枝简化为两阶段流程。在有限重训练轮次预算下，我们的方法能快速恢复模型精度。