Stochastic-gradient sampling methods are often used to perform Bayesian inference on neural networks. It has been observed that the methods in which notions of differential geometry are included tend to have better performances, with the Riemannian metric improving posterior exploration by accounting for the local curvature. However, the existing methods often resort to simple diagonal metrics to remain computationally efficient. This loses some of the gains. We propose two non-diagonal metrics that can be used in stochastic-gradient samplers to improve convergence and exploration but have only a minor computational overhead over diagonal metrics. We show that for fully connected neural networks (NNs) with sparsity-inducing priors and convolutional NNs with correlated priors, using these metrics can provide improvements. For some other choices the posterior is sufficiently easy also for the simpler metrics.
翻译:随机梯度采样方法常用于对神经网络进行贝叶斯推断。已有研究发现,包含微分几何概念的采样方法通常表现更优,其中黎曼度量通过考虑局部曲率来提升后验探索。然而,现有方法为保持计算效率往往采用简单的对角度量,这削弱了部分优势。本文提出两种非对角度量,可应用于随机梯度采样器以提升收敛性与探索能力,但相比对角度量仅带来极小的计算开销。研究表明:对于具有稀疏性诱导先验的全连接神经网络(NN)和具有相关先验的卷积神经网络,使用这些度量可带来性能提升。而针对某些其他先验选择,即便使用更简单的度量,后验也足够易于采样。