Variational inference (VI) can be cast as an optimization problem in which the variational parameters are tuned to closely align a variational distribution with the true posterior. The optimization task can be approached through vanilla gradient descent in black-box VI or natural-gradient descent in natural-gradient VI. In this work, we reframe VI as the optimization of an objective that concerns probability distributions defined over a \textit{variational parameter space}. Subsequently, we propose Wasserstein gradient descent for tackling this optimization problem. Notably, the optimization techniques, namely black-box VI and natural-gradient VI, can be reinterpreted as specific instances of the proposed Wasserstein gradient descent. To enhance the efficiency of optimization, we develop practical methods for numerically solving the discrete gradient flows. We validate the effectiveness of the proposed methods through empirical experiments on a synthetic dataset, supplemented by theoretical analyses.
翻译:变分推断(VI)可表述为一个优化问题,其中通过调整变分参数使变分分布与真实后验紧密对齐。该优化任务可通过黑盒VI中的普通梯度下降或自然梯度VI中的自然梯度下降来实现。本工作中,我们将VI重新定义为在\textit{变分参数空间}上定义的概率分布目标函数的优化问题。随后,我们提出使用Wasserstein梯度下降来解决此优化问题。值得注意的是,黑盒VI与自然梯度VI这两种优化技术可被重新解释为所提出的Wasserstein梯度下降的特例。为提升优化效率,我们开发了数值求解离散梯度流的实用方法。通过在合成数据集上的实证实验及理论分析,我们验证了所提出方法的有效性。