Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
翻译:神经网络修剪与量化技术几乎与神经网络本身一样历史悠久。然而,迄今为止仅有关于两者之间的即兴比较发表。本文旨在解答神经网络量化与修剪何者更优这一核心问题。通过回答该问题,我们希望为未来神经网络硬件的设计决策提供依据。我们对这两种深度神经网络压缩技术进行了全面比较。首先,针对通用数据分布,给出了量化与修剪预期误差的解析比较。其次,为训练后网络中逐层修剪和量化误差提供了下界,并将这些理论值与优化后的经验误差进行对比。最后,针对3项任务中的8个大模型训练开展了广泛实验比较。结果表明:在多数情况下,量化性能优于修剪。仅当压缩率极高时,修剪才可能从精度角度带来优势。