Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the advantages of the BMR method relative to established approaches based on hierarchical horseshoe priors over model weights. We illustrate the potential of BMR across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.
翻译:深度学习模型的巨大能力往往受限于其复杂性,这使得对高效稀疏化技术的需求日益增长。贝叶斯稀疏化作为深度学习的重要方法,有助于设计出既具备计算效率又在各类深度学习应用中性能具有竞争力的模型。当前最先进的深度神经网络贝叶斯稀疏化技术将模型权重的结构性收缩先验与基于随机变分推断的近似推理方案相结合。然而,完整生成模型的模型求逆在计算上极其昂贵,尤其是与标准点估计深度学习相比。在此背景下,我们主张采用贝叶斯模型约简(BMR)作为更高效的模型权重剪枝替代方案。作为Savage-Dickey比的推广形式,BMR允许基于简单(非层次化)生成模型下的后验估计,事后消除冗余模型权重。我们的对比研究凸显了BMR方法相较于基于层次马蹄先验的模型权重传统方法的优势。我们通过从LeNet等经典网络到Vision Transformers和MLP-Mixers等现代框架的各种深度学习架构,展示了BMR的潜力。