We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.
翻译:我们严格分析了变分推断(VI)训练贝叶斯神经网络在两层和无限宽情况下的过程。考虑一个带有正则化证据下界(ELBO)的回归问题,该下界分解为数据的期望对数似然与先验分布和变分后验之间的Kullback-Leibler(KL)散度。通过对KL项进行适当加权,我们证明了三种不同训练方案的大数定律:(i)通过重参数化技巧精确估计多重高斯积分的理想化情况;(ii)使用蒙特卡洛采样的小批量方案(通常称为Bayes by Backprop);(iii)我们引入的一种计算成本更低的新算法——最小变分推断(Minimal VI)。重要结果是所有方法都收敛到相同的平均场极限。最后,我们通过数值实验验证了结果,并讨论了推导中心极限定理的必要性。