Soft threshold pruning is among the cutting-edge pruning methods with state-of-the-art performance. However, previous methods either perform aimless searching on the threshold scheduler or simply set the threshold trainable, lacking theoretical explanation from a unified perspective. In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing. Under this theoretical framework, all threshold tuning strategies proposed in previous studies of soft threshold pruning are concluded as different styles of tuning $L_1$-regularization term. We further derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework. This scheduler keeps $L_1$-regularization coefficient stable, implying a time-invariant objective function from the perspective of optimization. In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD. We conduct extensive experiments and verify its state-of-the-art performance on both Artificial Neural Networks (ResNet-50 and MobileNet-V1) and Spiking Neural Networks (SEW ResNet-18) on ImageNet datasets. On the basis of this framework, we derive a family of pruning methods, including sparsify-during-training, early pruning, and pruning at initialization. The code is available at https://github.com/Yanqi-Chen/LATS.
翻译:软阈值剪枝是当前性能最先进的剪枝方法之一。然而,现有方法要么对阈值调度器进行盲目搜索,要么简单地将阈值设为可训练,缺乏从统一视角的理论解释。在本工作中,我们将软阈值剪枝重新表述为一个隐式优化问题,并采用稀疏恢复和压缩感知领域的经典方法——迭代收缩阈值算法(ISTA)求解。在该理论框架下,过往软阈值剪枝研究提出的所有阈值调整策略均可归结为对$L_1$正则化项的不同调优方式。我们进一步基于该框架对阈值调度进行深入研究,推导出最优阈值调度器。该调度器保持$L_1$正则化系数稳定,从优化角度意味着目标函数具有时不变性。原则上,所提出的剪枝算法能对任何通过SGD训练的数学模型进行稀疏化。我们在ImageNet数据集上对人工神经网络(ResNet-50和MobileNet-V1)与脉冲神经网络(SEW ResNet-18)进行了大量实验,验证了其最先进的性能。基于此框架,我们推导出一系列剪枝方法,包括训练中稀疏化、早期剪枝以及初始化阶段剪枝。代码开源于https://github.com/Yanqi-Chen/LATS。