Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
翻译:采样一个归一化常数未知的目标概率分布是计算科学与工程中的基本挑战。近期研究表明,通过考虑概率测度空间中的梯度流所推导出的算法为算法开发开辟了新途径。本文通过审视此类梯度流的设计组件,对该采样方法做出三项贡献。任何用于采样的梯度流实例化都需要能量泛函和度量来定义流动方向,以及流动的数值逼近方法来推导算法。第一项贡献是证明:在所有f散度中,KL散度作为能量泛函具有独特性质——由它产生的梯度流不依赖于目标分布的归一化常数。第二项贡献是从不变性角度研究度量选择问题。Fisher-Rao度量被称为具有微分同胚不变性的唯一度量(在缩放意义下)。作为一种计算可行的替代方案,我们引入了度量和梯度流的松弛仿射不变性。特别地,我们构造了多种仿射不变的Wasserstein和Stein梯度流。理论和粒子实验表明,在采样高度各向异性分布时,仿射不变梯度流比非仿射不变梯度流具有更优表现。第三项贡献是研究并开发基于高斯近似梯度流的高效算法,这为粒子方法提供了替代方案。我们建立了各类高斯近似梯度流之间的联系,讨论了它们与参数变分推断中梯度方法的关联,并从理论和数值角度研究了其收敛性质。