Given an intractable distribution $p$, the problem of variational inference (VI) is to compute the best approximation $q$ from some more tractable family $\mathcal{Q}$. Most commonly the approximation is found by minimizing a Kullback-Leibler (KL) divergence. However, there exist other valid choices of divergences, and when $\mathcal{Q}$ does not contain~$p$, each divergence champions a different solution. We analyze how the choice of divergence affects the outcome of VI when a Gaussian with a dense covariance matrix is approximated by a Gaussian with a diagonal covariance matrix. In this setting we show that different divergences can be \textit{ordered} by the amount that their variational approximations misestimate various measures of uncertainty, such as the variance, precision, and entropy. We also derive an impossibility theorem showing that no two of these measures can be simultaneously matched by a factorized approximation; hence, the choice of divergence informs which measure, if any, is correctly estimated. Our analysis covers the KL divergence, the R\'enyi divergences, and a score-based divergence that compares $\nabla\log p$ and $\nabla\log q$. We empirically evaluate whether these orderings hold when VI is used to approximate non-Gaussian distributions.
翻译:给定一个难以处理的分布$p$,变分推断(VI)问题是从更易处理的族$\mathcal{Q}$中计算最佳近似$q$。通常通过最小化Kullback-Leibler(KL)散度来寻找近似值。然而,存在其他有效的散度选择,并且当$\mathcal{Q}$不包含$p$时,每个散度会支持不同的解。我们分析了当具有稠密协方差矩阵的高斯分布被具有对角协方差矩阵的高斯分布近似时,散度选择如何影响VI的结果。在此设定下,我们证明不同散度可以根据其变分近似对各种不确定性度量(如方差、精度和熵)的误估量进行排序。我们还推导出一个不可能定理,表明因子化近似无法同时匹配这些度量中的任意两个;因此,散度的选择决定了哪个度量(如果有)被正确估计。我们的分析涵盖了KL散度、Rényi散度以及一种基于比较$\nabla\log p$和$\nabla\log q$的分数散度。我们通过实验评估了当VI用于近似非高斯分布时这些排序是否成立。