Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
翻译:以未知归一化常数对目标概率分布进行采样是计算科学与工程领域的一项基本挑战。近期研究表明,通过考虑概率测度空间中的梯度流所推导的算法为算法开发开辟了新途径。本文通过深入剖析此类梯度流的设计要素,为该采样方法做出三项贡献。梯度流采样方法的任意实例化都需要确定能量泛函和度量以定义流动,并需对流动进行数值逼近以推导算法。第一项贡献是证明:库尔贝克-莱布勒散度作为能量泛函,在所有f-散度中具有独特性质——由其衍生的梯度流不依赖目标分布的归一化常数。第二项贡献是从不变性视角研究度量选择。费希尔-拉奥度量被认为是唯一(在缩放意义下)具有微分同胚不变性的选择。作为计算可行的替代方案,我们引入了度量和梯度流的松弛仿射不变性。特别地,我们构造了多种仿射不变的瓦瑟斯坦和斯坦梯度流。理论与粒子方法实验表明,在采样高度各向异性分布时,仿射不变梯度流的表现优于非仿射不变梯度流。第三项贡献是研究并开发基于梯度流高斯近似的有效算法——这为粒子方法提供了替代方案。我们建立了多种高斯近似梯度流之间的联系,探讨了它们与参数变分推断中梯度方法的关联,并从理论与数值角度研究了其收敛性质。