Advancing Model Pruning via Bi-level Optimization

The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 times speedup over IMP for the same level of model accuracy and sparsity.

翻译：实际应用中的部署约束需要对大规模深度学习模型进行剪枝，即提升其权重的稀疏性。正如彩票假说（Lottery Ticket Hypothesis, LTH）所示，剪枝也有提升模型泛化能力的潜力。在LTH的核心中，迭代幅度剪枝（Iterative Magnitude Pruning, IMP）是成功找到“中奖彩票”的主要方法。然而，IMP的计算成本随着目标剪枝比例的增加而急剧增长。为降低计算开销，研究者开发了多种高效的“一次性”剪枝方法，但这些方案通常无法找到与IMP同等优秀的中奖彩票。这引出了如何弥合剪枝精度与剪枝效率之间差距的问题。为此，我们探索模型剪枝的算法进展。具体而言，我们从一个全新视角——双级优化（Bi-level Optimization, BLO）——来阐述剪枝问题。我们证明，BLO解读为IMP中剪枝-重训练学习范式的高效实现提供了有技术基础的优化基础。我们还表明，所提出的面向双级优化的剪枝方法（称为BiP）是BLO问题中具有双线性结构的一个特殊类别。通过利用这种双线性，我们从理论上证明BiP可以像一阶优化一样易于求解，从而继承了计算效率。通过在5种模型架构和4个数据集上的结构化与非结构化剪枝的大量实验，我们证明BiP在大多数情况下比IMP能找到更优的中奖彩票，且在计算效率上与一次性剪枝方案相当，在相同模型精度和稀疏度水平下，BiP比IMP实现了2-7倍的加速。