Stein variational gradient descent (SVGD) [Liu and Wang, 2016] performs approximate Bayesian inference by representing the posterior with a set of particles. However, SVGD suffers from variance collapse, i.e. poor predictions due to underestimating uncertainty [Ba et al., 2021], even for moderately-dimensional models such as small Bayesian neural networks (BNNs). To address this issue, we generalize SVGD by letting each particle parameterize a component distribution in a mixture model. Our method, Stein Mixture Inference (SMI), optimizes a lower bound to the evidence (ELBO) and introduces user-specified guides parameterized by particles. SMI extends the Nonlinear SVGD framework [Wang and Liu, 2019] to the case of variational Bayes. SMI effectively avoids variance collapse, judging by a previously described test developed for this purpose, and performs well on standard data sets. In addition, SMI requires considerably fewer particles than SVGD to accurately estimate uncertainty for small BNNs. The synergistic combination of NSVGD, ELBO optimization and user-specified guides establishes a promising approach towards variational Bayesian inference in the case of tall and wide data.
翻译:Stein变分梯度下降(SVGD)[Liu and Wang, 2016]通过一组粒子表示后验分布来执行近似贝叶斯推断。然而,即使对于中等维度的模型(如小型贝叶斯神经网络(BNNs)),SVGD也存在方差坍缩问题,即因低估不确定性而导致预测性能不佳[Ba et al., 2021]。为解决该问题,我们通过让每个粒子参数化混合模型中的分量分布来推广SVGD。我们的方法——Stein混合推断(SMI)——优化证据下界(ELBO)并引入由粒子参数化的用户指定引导分布。SMI将非线性SVGD框架[Wang and Liu, 2019]扩展至变分贝叶斯场景。根据为此目的开发的测试评估,SMI能有效避免方差坍缩,并在标准数据集上表现良好。此外,对于小型BNNs的不确定性准确估计,SMI所需粒子数显著少于SVGD。非线性SVGD、ELBO优化与用户指定引导分布的协同结合,为高维大数据场景下的变分贝叶斯推断建立了一种具有前景的研究路径。