Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

Pruning neural networks has become popular in the last decade when it was shown that a large number of weights can be safely removed from modern neural networks without compromising accuracy. Numerous pruning methods have been proposed since then, each claiming to be better than the previous. Many state-of-the-art (SOTA) techniques today rely on complex pruning methodologies utilizing importance scores, getting feedback through back-propagation or having heuristics-based pruning rules amongst others. In this work, we question whether this pattern of introducing complexity is really necessary to achieve better pruning results. We benchmark these SOTA techniques against a naive pruning baseline, namely, Global Magnitude Pruning (Global MP). Global MP ranks weights in order of their magnitudes and prunes the smallest ones. Hence, in its vanilla form, it is one of the simplest pruning techniques. Surprisingly, we find that vanilla Global MP outperforms all the other SOTA techniques and achieves a new SOTA result. It also achieves promising performance on FLOPs sparsification, which we find is enhanced, when pruning is conducted in a gradual fashion. We also find that Global MP is generalizable across tasks, datasets, and models with superior performance. Moreover, a common issue that many pruning algorithms run into at high sparsity rates, namely, layer-collapse, can be easily fixed in Global MP by setting a minimum threshold of weights to be retained in each layer. Lastly, unlike many other SOTA techniques, Global MP does not require any additional algorithm specific hyper-parameters and is very straightforward to tune and implement. We showcase our findings on various models (WRN-28-8, ResNet-32, ResNet-50, MobileNet-V1 and FastGRNN) and multiple datasets (CIFAR-10, ImageNet and HAR-2). Code is available at https://github.com/manasgupta-1/GlobalMP.

翻译：过去十年间，神经网络剪枝技术日益流行，研究表明现代神经网络中大量权重可在不影响精度的前提下被安全移除。自那以来，众多剪枝方法相继提出，各自声称优于先前方案。当前许多最先进技术依赖于复杂的剪枝方法学——利用重要性分数、通过反向传播获取反馈、或基于启发式规则的剪枝策略等。本研究质疑：引入这种复杂性是否真的必要？我们将这些最先进技术与朴素剪枝基线——全局幅度剪枝进行基准测试。全局幅度剪枝根据权重幅度排序并剪除最小值，其原始形式是最简单的剪枝技术之一。令人惊讶的是，我们发现原始全局幅度剪枝超越了所有其他最先进技术，并创造了新的最优结果。在FLOPs稀疏化方面，采用渐进式剪枝时其性能得到增强，展现出优异表现。我们还发现全局幅度剪枝在任务、数据集和模型间具有良好的泛化能力，性能表现卓越。此外，许多剪枝算法在高稀疏率时面临的层坍塌问题，可通过为每层设置最小保留权重阈值在全局幅度剪枝中轻松解决。最后，与多数其他最先进技术不同，全局幅度剪枝不需要额外算法特定超参数，实现与调优极为直接。我们在多种模型（WRN-28-8、ResNet-32、ResNet-50、MobileNet-V1和FastGRNN）及多个数据集（CIFAR-10、ImageNet和HAR-2）上展示了研究结果。代码开源地址：https://github.com/manasgupta-1/GlobalMP。