Variational Inference with Gaussian Score Matching

Variational inference (VI) is a method to approximate the computationally intractable posterior distributions that arise in Bayesian statistics. Typically, VI fits a simple parametric distribution to the target posterior by minimizing an appropriate objective such as the evidence lower bound (ELBO). In this work, we present a new approach to VI based on the principle of score matching, that if two distributions are equal then their score functions (i.e., gradients of the log density) are equal at every point on their support. With this, we develop score matching VI, an iterative algorithm that seeks to match the scores between the variational approximation and the exact posterior. At each iteration, score matching VI solves an inner optimization, one that minimally adjusts the current variational estimate to match the scores at a newly sampled value of the latent variables. We show that when the variational family is a Gaussian, this inner optimization enjoys a closed form solution, which we call Gaussian score matching VI (GSM-VI). GSM-VI is also a ``black box'' variational algorithm in that it only requires a differentiable joint distribution, and as such it can be applied to a wide class of models. We compare GSM-VI to black box variational inference (BBVI), which has similar requirements but instead optimizes the ELBO. We study how GSM-VI behaves as a function of the problem dimensionality, the condition number of the target covariance matrix (when the target is Gaussian), and the degree of mismatch between the approximating and exact posterior distribution. We also study GSM-VI on a collection of real-world Bayesian inference problems from the posteriorDB database of datasets and models. In all of our studies we find that GSM-VI is faster than BBVI, but without sacrificing accuracy. It requires 10-100x fewer gradient evaluations to obtain a comparable quality of approximation.

翻译：变分推断（VI）是一种近似贝叶斯统计中计算上难以处理的后验分布的方法。通常，VI通过最小化适当的量（如证据下界ELBO）来拟合一个简单的参数化分布至目标后验分布。本文提出了一种基于分数匹配原理的新变分推断方法，该原理指出：若两个分布相等，则它们的分数函数（即对数密度的梯度）在其支撑集上每一点都相等。据此，我们开发了分数匹配变分推断——一种迭代算法，旨在匹配变分近似与精确后验的分数函数。每次迭代中，分数匹配变分推断需求解一个内部优化问题，即最小程度调整当前变分估计以匹配新采样潜变量值处的分数函数。当变分族为高斯分布时，该内部优化存在闭式解，我们将其称为高斯分数匹配变分推断（GSM-VI）。GSM-VI也是一种“黑盒”变分算法：仅需可微的联合分布，因此可应用于广泛模型类别。我们比较了GSM-VI与具有类似要求但优化ELBO的黑盒变分推断（BBVI）。通过控制问题维度、目标协方差矩阵条件数（当目标为高斯分布时）以及近似后验与精确后验的失配程度，研究了GSM-VI的行为。同时，基于后验DB数据集和模型数据库中的真实贝叶斯推断问题进行了实证研究。所有实验表明：GSM-VI在保证精度的前提下速度优于BBVI，且达到同等近似质量所需梯度评估次数减少10-100倍。