Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood

Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to na\"ively deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a pruning framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam's razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior Hessian approximation used in the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.

翻译：神经网络稀疏化是节省计算时间与内存开销的有效途径，在当前许多成功的人工智能模型因规模过大而难以直接部署于消费级硬件的背景下尤为重要。尽管已有大量研究聚焦于不同的权重剪枝准则，但网络的整体可稀疏性——即在不损失性能的前提下被剪枝的能力——却常被忽视。本文提出基于边缘似然的可稀疏性评估框架（SpaM），该剪枝框架通过结合贝叶斯边缘似然与稀疏性诱导先验，显著提升了神经网络的可稀疏性。我们的方法实现了自动化的奥卡姆剃刀机制，能够选择在保持数据解释能力的前提下最易稀疏化的模型，适用于结构化和非结构化稀疏化场景。此外，我们证明了拉普拉斯近似中预先计算的后验海森矩阵近似可被重复用于构建高效的剪枝准则，其性能优于许多现有（计算成本更高）的方法。我们在多种神经网络架构与数据集上验证了该框架的有效性，尤其在高稀疏度场景下表现突出。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日