Variational inference is a technique that approximates a target distribution by optimizing within the parameter space of variational families. On the other hand, Wasserstein gradient flows describe optimization within the space of probability measures where they do not necessarily admit a parametric density function. In this paper, we bridge the gap between these two methods. We demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow where its forward Euler scheme is the standard black-box variational inference algorithm. Specifically, the vector field of the gradient flow is generated via the path-derivative gradient estimator. We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow. Distillations can be extended to encompass $f$-divergences and non-Gaussian variational families. This extension yields a new gradient estimator for $f$-divergences, readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.
翻译:变分推断是一种通过在变分族的参数空间内进行优化来逼近目标分布的技术。另一方面,Wasserstein梯度流描述了在概率测度空间中的优化过程,其中这些测度不一定具有参数化密度函数。在本文中,我们架接了这两种方法之间的鸿沟。我们证明,在某些条件下,Bures-Wasserstein梯度流可以重新表述为欧几里得梯度流,其前向欧拉方案即为标准的黑箱变分推断算法。具体而言,该梯度流的向量场通过路径导数梯度估计器生成。我们还提供了路径导数梯度的另一种视角,将其视为一种向Wasserstein梯度流的蒸馏过程。蒸馏方法可扩展到包含$f$-散度及非高斯变分族。这一扩展为$f$-散度提供了一种新的梯度估计器,可直接利用当代机器学习库(如PyTorch或TensorFlow)实现。