Along with Markov chain Monte Carlo (MCMC) methods, variational inference (VI) has emerged as a central computational approach to large-scale Bayesian inference. Rather than sampling from the true posterior $\pi$, VI aims at producing a simple but effective approximation $\hat \pi$ to $\pi$ for which summary statistics are easy to compute. However, unlike the well-studied MCMC methodology, algorithmic guarantees for VI are still relatively less well-understood. In this work, we propose principled methods for VI, in which $\hat \pi$ is taken to be a Gaussian or a mixture of Gaussians, which rest upon the theory of gradient flows on the Bures--Wasserstein space of Gaussian measures. Akin to MCMC, it comes with strong theoretical guarantees when $\pi$ is log-concave.
翻译:与马尔可夫链蒙特卡洛(MCMC)方法一样,变分推断(VI)已成为大规模贝叶斯推断的核心计算方法。VI并非从真实后验分布$\pi$中采样,而是旨在生成一个简单但有效的近似分布$\hat \pi$,使其易于计算汇总统计量。然而,与经过充分研究的MCMC方法不同,VI的算法保障仍相对未被充分理解。在本工作中,我们提出了基于高斯或高斯混合分布的VI原则性方法,这些方法建立在Bures–Wasserstein高斯测度空间上的梯度流理论之上。与MCMC类似,当$\pi$为对数凹分布时,该方法具有强理论保障。