Variational inference (VI) approximates a target density $p$ by the best match $q$ in a family of tractable distributions. The best variational approximation is found by minimizing a divergence between distributions, $D(p||q)$, and several divergences have been proposed as objective functions for VI, with different choices leading to different approximations. We show that even when these divergences have different minimizers, the resulting approximations all abide by certain symmetry-matching principles. Specifically, our results hold for all $f$-divergences, a broad class which includes the reverse and forward Kullback-Leibler divergences and the $α$-divergences. We show that in the presence of even symmetry, any stationary point of an $f$-divergence is guaranteed to recover the mean of $p$ and likewise, in the presence of elliptical symmetry, any stationary point is guaranteed to recover its correlation matrix. To obtain these guarantees we assume that $p$ and $q$ are unimodal, but notably we do not require them to be log-concave, light-tailed, or even everywhere-smooth. These guarantees generalize a previous result obtained for the reverse Kullback-Leibler divergence when $p$ is log-concave. They also extend to cases where the target density $p$ only exhibits symmetry along some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry.
翻译:变分推理(VI)通过从易处理的分布族中选取最优匹配$q$来近似目标密度$p$。最优变分近似通过最小化分布之间的散度$D(p||q)$实现,目前已提出多种散度作为VI的目标函数,不同选择会导致不同近似结果。我们证明,即使这些散度具有不同的极小化点,所得近似仍遵循特定的对称匹配原则。具体而言,我们的结论适用于所有$f$-散度——这一广泛类别包括逆向和正向Kullback-Leibler散度以及$\alpha$-散度。研究表明,在偶对称条件下,$f$-散度的任意驻点必能恢复$p$的均值;类似地,在椭圆对称条件下,任意驻点必能恢复其相关矩阵。为获得这些保证,我们假设$p$和$q$是单峰的,但值得注意的是,并不要求它们是对数凹的、轻尾的,甚至无需处处光滑。这些保证推广了先前关于$p$为对数凹时逆向Kullback-Leibler散度的结论,并扩展至目标密度$p$仅沿部分坐标而非全部坐标呈现对称性的情形。这种部分对称性在贝叶斯层次模型中自然产生,此时先验虽导致复杂的几何结构,但仍保留对称轴。