Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though~$p$ itself does not factorize. We show that this mismatch leads to an impossibility theorem: if $p$ does not factorize, then any factorized approximation $q\in Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which can be related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general R\'enyi divergences, and a score-based divergence which compares $\nabla \log p$ and $\nabla \log q$. We provide a thorough theoretical analysis in the setting where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. We show that all the considered divergences can be \textit{ordered} based on the estimates of uncertainty they yield as objective functions for~VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian.
翻译:给定一个难以处理的分布$p$,变分推断(VI)的目标是从某个更易处理的分布族$Q$中找到最佳近似。通常,即使$p$本身不满足因子分解性质,我们仍选择$Q$为因子化分布族(即均值场假设)。本文证明这种不匹配会导致一个不可能性定理:若$p$不可因子化,则任何因子化近似$q\in Q$最多只能正确估计以下三种不确定性度量中的一种:(i)边缘方差,(ii)边缘精度,或(iii)广义方差(该量与熵相关)。实践中,通过最小化分布间散度$D(q,p)$来寻找$Q$中的最优变分近似,因此我们探究:散度的选择如何决定VI能正确估计哪种不确定性度量(若存在)?我们考虑了经典的Kullback-Leibler散度、更一般的Rényi散度,以及基于梯度比较$\nabla \log p$和$\nabla \log q$的得分匹配散度。在$p$为高斯分布且$q$为(因子化)高斯分布的设定下,我们提供了系统的理论分析。证明所有考虑的散度均可根据其作为VI目标函数所产生的不确定性估计结果进行\textit{排序}。最后,我们通过实验验证了当目标分布$p$为非高斯分布时该排序的有效性。