In variational inference (VI), the practitioner approximates a high-dimensional distribution $π$ with a simple surrogate one, often a (product) Gaussian distribution. However, in many cases of practical interest, Gaussian distributions might not capture the correct radial profile of $π$, resulting in poor coverage. In this work, we approach the VI problem from the perspective of optimizing over these radial profiles. Our algorithm radVI is a cheap, effective add-on to many existing VI schemes, such as Gaussian (mean-field) VI and Laplace approximation. We provide theoretical convergence guarantees for our algorithm, owing to recent developments in optimization over the Wasserstein space--the space of probability distributions endowed with the Wasserstein distance--and new regularity properties of radial transport maps in the style of Caffarelli (2000).
翻译:在变分推断(VI)中,研究者通常使用简单的替代分布(常为(乘积)高斯分布)来近似高维分布 $π$。然而,在许多实际应用中,高斯分布可能无法准确捕捉 $π$ 的径向轮廓,从而导致覆盖效果不佳。本文从优化这些径向轮廓的角度出发,探讨变分推断问题。我们提出的算法 radVI 是一种廉价且高效的附加模块,可兼容多种现有 VI 方案,如高斯(均值场)VI 和拉普拉斯近似。得益于 Wasserstein 空间(即赋予 Wasserstein 距离的概率分布空间)优化理论的最新进展,以及 Caffarelli(2000)风格的径向传输映射新正则性性质,我们为算法提供了理论收敛性保证。