Bayesian inference allows us to define a posterior distribution over the weights of a generic neural network (NN). Exact posteriors are usually intractable, in which case approximations can be employed. One such approximation - variational inference - is computationally efficient when using mini-batch stochastic gradient descent as subsets of the data are used for likelihood and gradient evaluations, though the approach relies on the selection of a variational distribution which sufficiently matches the form of the posterior. Particle-based methods such as Markov chain Monte Carlo and Sequential Monte Carlo (SMC) do not assume a parametric family for the posterior by typically require higher computational cost. These sampling methods typically use the full-batch of data for likelihood and gradient evaluations, which contributes to this computational expense. We explore several methods of gradually introducing more mini-batches of data (data annealing) into likelihood and gradient evaluations of an SMC sampler. We find that we can achieve up to $6\times$ faster training with minimal loss in accuracy on benchmark image classification problems using NNs.
翻译:贝叶斯推断允许我们定义通用神经网络权重的后验分布。精确后验通常难以处理,此时可采用近似方法。其中一种近似方法——变分推断——在使用小批量随机梯度下降时具有计算效率,因为数据子集被用于似然度和梯度评估,但该方法依赖于选择与后验形式充分匹配的变分分布。基于粒子的方法(如马尔可夫链蒙特卡罗和序贯蒙特卡罗)不假设后验的参数族形式,但通常需要更高的计算成本。这些采样方法通常使用全批量数据进行似然度和梯度评估,这导致了计算开销。我们探索了在SMC采样器的似然度和梯度评估中逐步引入更多小批量数据(数据退火)的几种方法。通过在基准图像分类问题上使用神经网络进行测试,我们发现能以最小精度损失实现高达$6\times$的训练加速。