Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on black-box stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the computational efficiency and the pruning rate of the BMR method relative to the established stochastic variational inference (SVI) scheme, when applied to the full hierarchical generative model. We illustrate the potential of BMR to prune model parameters across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.
翻译:深度学习虽拥有强大的能力,但其模型复杂性往往带来约束,导致对高效稀疏化技术的需求日益增长。深度学习的贝叶斯稀疏化方法应运而生,能够设计出既计算高效又能在各类深度学习应用中保持性能竞争力的模型。当前深度神经网络贝叶斯稀疏化的前沿技术,是将模型权重的结构收缩先验与基于黑箱随机变分推断的近似推理方案相结合。然而,完整生成模型的模型反演在计算上极其耗时,尤其与标准点估计深度学习相比更显低效。针对此问题,我们主张将贝叶斯模型缩减作为更高效的模型权重剪枝替代方案。作为Savage-Dickey比率的一般化形式,BMR能够基于非层次生成模型的后验估计,事后消除冗余模型权重。我们的比较研究表明,当应用于完整层次生成模型时,BMR方法相较于成熟的随机变分推断方案在计算效率和剪枝率方面均具优势。我们展示了BMR在多种深度学习架构中剪枝模型参数的潜力,涵盖从LeNet等经典网络到Vision Transformers和MLP-Mixers等现代框架。