Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
翻译:采样具有未知归一化常数的目标概率分布是计算科学与工程中的基本挑战。近期研究表明,通过在概率测度空间中考虑梯度流推导的算法为算法开发开辟了新途径。本文通过审慎分析此类梯度流的设计组件,为该采样方法做出三项贡献。梯度流的任何具体实现都需要能量泛函和度量来确定流形,同时需要流的数值逼近来推导算法。我们的第一项贡献是证明Kullback-Leibler散度作为能量泛函具有独特性质(在所有f-散度中),即由其产生的梯度流不依赖于目标分布的归一化常数。第二项贡献是从不变性角度研究度量选择。Fisher-Rao度量被认为是具有微分同胚不变性的唯一选择(在缩放意义上)。作为计算可行的替代方案,我们引入了度量和梯度流的松弛仿射不变性。具体而言,我们构造了多种仿射不变的Wasserstein和Stein梯度流。理论和粒子方法均表明,在采样高度各向异性分布时,仿射不变梯度流的表现优于非仿射不变对应方法。第三项贡献是研究并开发基于梯度流高斯近似的高效算法,从而为粒子方法提供替代方案。我们建立了多种高斯近似梯度流之间的联系,探讨其与参数变分推理导出的梯度方法的关系,并从理论和数值角度研究其收敛性。