Stein Variational Gradient Descent (SVGD) is a popular variational inference algorithm which simulates an interacting particle system to approximately sample from a target distribution, with impressive empirical performance across various domains. Theoretically, its population (i.e, infinite-particle) limit dynamics is well studied but the behavior of SVGD in the finite-particle regime is much less understood. In this work, we design two computationally efficient variants of SVGD, namely VP-SVGD and GB-SVGD, with provably fast finite-particle convergence rates. We introduce the notion of virtual particles and develop novel stochastic approximations of population-limit SVGD dynamics in the space of probability measures, which are exactly implementable using a finite number of particles. Our algorithms can be viewed as specific random-batch approximations of SVGD, which are computationally more efficient than ordinary SVGD. We show that the $n$ particles output by VP-SVGD and GB-SVGD, run for $T$ steps with batch-size $K$, are at-least as good as i.i.d samples from a distribution whose Kernel Stein Discrepancy to the target is at most $O\left(\tfrac{d^{1/3}}{(KT)^{1/6}}\right)$ under standard assumptions. Our results also hold under a mild growth condition on the potential function, which is much weaker than the isoperimetric (e.g. Poincare Inequality) or information-transport conditions (e.g. Talagrand's Inequality $\mathsf{T}_1$) generally considered in prior works. As a corollary, we consider the convergence of the empirical measure (of the particles output by VP-SVGD and GB-SVGD) to the target distribution and demonstrate a double exponential improvement over the best known finite-particle analysis of SVGD. Beyond this, our results present the first known oracle complexities for this setting with polynomial dimension dependence.
翻译:斯坦因变分梯度下降(SVGD)是一种流行的变分推断算法,通过模拟交互粒子系统近似从目标分布采样,在各领域展现出卓越的实证性能。理论上,其总体(即无限粒子)极限动力学已被深入研究,但有限粒子状态下SVGD的行为尚未被充分理解。本文设计了两种计算高效的SVGD变体——VP-SVGD和GB-SVGD,并证明了其快速有限粒子收敛速率。我们引入虚拟粒子概念,在概率测度空间中开发了总体极限SVGD动力学的新型随机逼近方法,这些方法可通过有限粒子精确实现。我们的算法可视为SVGD的特定随机批处理逼近,计算效率优于常规SVGD。研究表明,在标准假设下,由VP-SVGD和GB-SVGD运行T步、批大小为K所输出的n个粒子,其与目标分布之间的核斯坦因散度(KSD)至多为O(d^{1/3}/(KT)^{1/6}),性能至少等同于独立同分布样本。我们的结果在势函数满足温和增长条件下同样成立,该条件远比先前工作中普遍考虑的等周条件(如庞加莱不等式)或信息传输条件(如Talagrand不等式T₁)更为宽松。作为推论,我们考察了VP-SVGD和GB-SVGD输出粒子的经验测度向目标分布的收敛性,证实在有限粒子分析中该收敛速度较已知最优SVGD结果实现了双重指数级提升。此外,我们的结果首次给出了该设置下具有多项式维度依赖性的预言机复杂度。