Bayesian neural network inference is often carried out using stochastic gradient sampling methods. For best performance the methods should use a Riemannian metric that improves posterior exploration by accounting for the local curvature, but the existing methods resort to simple diagonal metrics to remain computationally efficient. This loses some of the gains. We propose two non-diagonal metrics that can be used in stochastic samplers to improve convergence and exploration but that have only a minor computational overhead over diagonal metrics. We show that for neural networks with complex posteriors, caused e.g. by use of sparsity-inducing priors, using these metrics provides clear improvements. For some other choices the posterior is sufficiently easy also for the simpler metrics.
翻译:贝叶斯神经网络推断通常采用随机梯度采样方法实现。为获得最佳性能,这些方法应使用能通过考虑局部曲率来改进后验探索的黎曼度量,但现有方法为保持计算效率而局限于简单的对角度量,这损失了部分增益。我们提出两种可应用于随机采样器的非对角度量,这些度量能在仅增加微量计算开销(相较于对角度量)的前提下提升收敛速度与探索效率。研究表明,对于因使用稀疏诱导先验等导致复杂后验的神经网络,采用这些度量可带来显著改善。而对于部分其他选择,即便使用简单度量也能获得足够简单的后验分布。