Modern neural networks are often massively overparameterized leading to high compute costs during training and at inference. One effective method to improve both the compute and energy efficiency of neural networks while maintaining good performance is structured pruning, where full network structures (e.g.~neurons or convolutional filters) that have limited impact on the model output are removed. In this work, we propose Bayesian Model Reduction for Structured pruning (BMRS), a fully end-to-end Bayesian method of structured pruning. BMRS is based on two recent methods: Bayesian structured pruning with multiplicative noise, and Bayesian model reduction (BMR), a method which allows efficient comparison of Bayesian models under a change in prior. We present two realizations of BMRS derived from different priors which yield different structured pruning characteristics: 1) BMRS_N with the truncated log-normal prior, which offers reliable compression rates and accuracy without the need for tuning any thresholds and 2) BMRS_U with the truncated log-uniform prior that can achieve more aggressive compression based on the boundaries of truncation. Overall, we find that BMRS offers a theoretically grounded approach to structured pruning of neural networks yielding both high compression rates and accuracy. Experiments on multiple datasets and neural networks of varying complexity showed that the two BMRS methods offer a competitive performance-efficiency trade-off compared to other pruning methods.
翻译:现代神经网络通常存在严重的过参数化现象,导致训练和推理阶段产生高昂的计算成本。结构化剪枝是一种在保持良好性能的同时提升神经网络计算效率和能效的有效方法,其通过移除对模型输出影响有限的完整网络结构(如神经元或卷积滤波器)来实现这一目标。本研究提出面向结构化剪枝的贝叶斯模型约简方法(BMRS),这是一种完全端到端的贝叶斯结构化剪枝方法。BMRS基于两种近期提出的方法:基于乘性噪声的贝叶斯结构化剪枝,以及能够在先验分布变化下高效比较贝叶斯模型的贝叶斯模型约简(BMR)方法。我们提出了两种基于不同先验分布的BMRS实现方案,它们具有不同的结构化剪枝特性:1)采用截断对数正态先验的BMRS_N方法,该方法无需调整任何阈值即可提供可靠的压缩率与精度;2)采用截断对数均匀先验的BMRS_U方法,该方法能够基于截断边界实现更激进的压缩。总体而言,BMRS为神经网络结构化剪枝提供了理论完备的解决方案,同时实现了高压缩率与高精度。在多个数据集和不同复杂度的神经网络上的实验表明,相较于其他剪枝方法,两种BMRS方法在性能与效率的权衡方面具有竞争优势。