A comprehensive study of spike and slab shrinkage priors for structurally sparse Bayesian neural networks

Network complexity and computational efficiency have become increasingly significant aspects of deep learning. Sparse deep learning addresses these challenges by recovering a sparse representation of the underlying target function by reducing heavily over-parameterized deep neural networks. Specifically, deep neural architectures compressed via structured sparsity (e.g. node sparsity) provide low latency inference, higher data throughput, and reduced energy consumption. In this paper, we explore two well-established shrinkage techniques, Lasso and Horseshoe, for model compression in Bayesian neural networks. To this end, we propose structurally sparse Bayesian neural networks which systematically prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors, and develop computationally tractable variational inference including continuous relaxation of Bernoulli variables. We establish the contraction rates of the variational posterior of our proposed models as a function of the network topology, layer-wise node cardinalities, and bounds on the network weights. We empirically demonstrate the competitive performance of our models compared to the baseline models in prediction accuracy, model compression, and inference latency.

翻译：网络复杂度与计算效率已成为深度学习日益重要的议题。稀疏深度学习通过压缩过度参数化的深度神经网络，恢复目标函数的稀疏表示，从而应对这些挑战。具体而言，通过结构化稀疏性（如节点稀疏性）压缩的深度神经架构可实现低延迟推理、更高数据吞吐量和更低能耗。本文探讨了两种成熟的收缩技术——Lasso和Horseshoe，用于贝叶斯神经网络的模型压缩。为此，我们提出结构化稀疏贝叶斯神经网络，通过以下两种先验系统性剪枝冗余节点：(i) 尖峰与平板组Lasso (Spike-and-Slab Group Lasso, SS-GL) 和 (ii) 尖峰与平板组Horseshoe (Spike-and-Slab Group Horseshoe, SS-GHS)，并开发了包含伯努利变量连续松弛的可计算变分推理方法。我们建立了所提模型变分后验的收缩率与网络拓扑结构、逐层节点基数及网络权重约束的函数关系。实验表明，与基线模型相比，本模型在预测精度、模型压缩率和推理延迟方面均具有竞争性表现。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日