In variational inference (VI), the practitioner approximates a high-dimensional distribution $π$ with a simple surrogate one, often a (product) Gaussian distribution. However, in many cases of practical interest, Gaussian distributions might not capture the correct radial profile of $π$, resulting in poor coverage. In this work, we approach the VI problem from the perspective of optimizing over these radial profiles. Our algorithm radVI is a cheap, effective add-on to many existing VI schemes, such as Gaussian (mean-field) VI and Laplace approximation. We provide theoretical convergence guarantees for our algorithm, owing to recent developments in optimization over the Wasserstein space--the space of probability distributions endowed with the Wasserstein distance--and new regularity properties of radial transport maps in the style of Caffarelli (2000).
翻译:在变分推断(VI)中,实践者采用一个简单的替代分布(通常为(乘积)高斯分布)来近似高维分布 $π$。然而,在许多实际案例中,高斯分布可能无法准确捕捉 $π$ 的正确径向轮廓,导致覆盖性较差。本研究从优化这些径向轮廓的角度出发处理变分推断问题。我们的算法 radVI 是一种低成本且有效的附加方案,可应用于多种现有变分推断框架,例如高斯(平均场)变分推断和拉普拉斯近似。得益于 Wasserstein 空间(即赋予 Wasserstein 距离的概率分布空间)上优化理论的最新进展,以及基于 Caffarelli(2000)风格的新型径向传输映射正则性质,我们为该算法提供了理论收敛性保证。