Bayesian (deep) neural networks (BNN) are often more attractive than the mainstream point-estimate vanilla deep learning in various aspects including uncertainty quantification, robustness to noise, resistance to overfitting, and more. The variational inference (VI) is one of the most widely adopted approximate inference methods. Whereas the ELBO-based variational free energy method is a dominant choice in the literature, in this paper we introduce a score-based alternative for BNN variational inference. Although there have been quite a few score-based variational inference methods proposed in the community, most are not adequate for large-scale BNNs for various computational and technical reasons. We propose a novel scalable VI method where the learning objective combines the score matching loss and the proximal penalty term in iterations, which helps our method avoid the reparametrized sampling, and allows for noisy unbiased mini-batch scores through stochastic gradients. This in turn makes our method scalable to large-scale neural networks including Vision Transformers, and allows for richer variational density families. On several benchmarks including visual recognition and time-series forecasting with large-scale deep networks, we empirically show the effectiveness of our approach.
翻译:贝叶斯(深度)神经网络(BNN)在不确定性量化、噪声鲁棒性、抗过拟合能力等多个方面通常比主流的点估计式普通深度学习更具吸引力。变分推断(VI)是最广泛采用的近似推断方法之一。尽管基于证据下界(ELBO)的变分自由能方法是文献中的主流选择,本文提出了一种基于分数的贝叶斯神经网络变分推断替代方案。尽管学界已提出多种基于分数的变分推断方法,但由于计算和技术层面的多重限制,大多数方法难以适用于大规模贝叶斯神经网络。我们提出了一种新颖的可扩展变分推断方法,其学习目标在迭代过程中结合分数匹配损失与邻近惩罚项,这种方法避免了重参数化采样,并允许通过随机梯度获取带噪声的无偏小批量分数。这使得我们的方法能够扩展到包括视觉Transformer在内的大规模神经网络,并支持更丰富的变分密度族。通过在视觉识别和时间序列预测等多个大规模深度网络基准测试上的实证研究,我们验证了该方法的有效性。