Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on submodular optimization. In particular, for a given layer, we select neurons/channels to prune and corresponding new weights for the next layer, that minimize the change in the next layer's input induced by pruning. We show that this selection problem is a weakly submodular maximization problem, thus it can be provably approximated using an efficient greedy algorithm. Our method is guaranteed to have an exponentially decreasing error between the original model and the pruned model outputs w.r.t the pruned size, under reasonable assumptions. It is also one of the few methods in the literature that uses only a limited-number of training data and no labels. Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.
翻译:摘要:结构剪枝是一种在不显著影响性能的前提下压缩大型预训练神经网络的有效方法。然而,当前大多数结构剪枝方法缺乏性能保证,且通常需要微调,这使得它们在有限数据场景下难以应用。我们提出了一种基于子模优化的、具有理论保证的数据高效结构剪枝方法。具体而言,对于给定层,我们选择待剪枝的神经元/通道,并为下一层分配相应新权重,以最小化剪枝导致的下一层输入变化。我们证明该选择问题是一个弱子模最大化问题,因此可通过高效贪婪算法以可证明的近似比求解。在合理假设下,我们的方法保证原始模型与剪枝模型输出之间的误差随剪枝规模指数级衰减。该方法也是文献中少数仅需少量无标签训练数据的方法之一。实验结果表明,在有限数据场景下,我们的方法优于现有最先进方法。