Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.
翻译:模型压缩近来在深度学习领域备受关注。压缩密集神经网络具有诸多优势,包括降低计算成本、提升在有限存储和内存设备上的可部署性,以及增强对抗攻击的鲁棒性。这可通过权重剪枝或完全丢弃某些输入特征来实现。本文展示了一种在深度学习框架中模拟贝叶斯模型选择原理的新策略。给定一个通过变分算法训练的、具有尖峰-平板先验的全连接贝叶斯神经网络,我们获得了通常被忽略的每个节点的后验包含概率。我们在大量模拟和真实世界基准数据上利用这些概率进行剪枝和特征选择,并在所有实验中发现剪枝后的模型具有更好的泛化能力。