Stein Variational Gradient Descent (SVGD) is a popular variational inference algorithm which simulates an interacting particle system to approximately sample from a target distribution, with impressive empirical performance across various domains. Theoretically, its population (i.e, infinite-particle) limit dynamics is well studied but the behavior of SVGD in the finite-particle regime is much less understood. In this work, we design two computationally efficient variants of SVGD, namely VP-SVGD (which is conceptually elegant) and GB-SVGD (which is empirically effective), with provably fast finite-particle convergence rates. We introduce the notion of \emph{virtual particles} and develop novel stochastic approximations of population-limit SVGD dynamics in the space of probability measures, which are exactly implementable using a finite number of particles. Our algorithms can be viewed as specific random-batch approximations of SVGD, which are computationally more efficient than ordinary SVGD. We show that the $n$ particles output by VP-SVGD and GB-SVGD, run for $T$ steps with batch-size $K$, are at-least as good as i.i.d samples from a distribution whose Kernel Stein Discrepancy to the target is at most $O\left(\tfrac{d^{1/3}}{(KT)^{1/6}}\right)$ under standard assumptions. Our results also hold under a mild growth condition on the potential function, which is much weaker than the isoperimetric (e.g. Poincare Inequality) or information-transport conditions (e.g. Talagrand's Inequality $\mathsf{T}_1$) generally considered in prior works. As a corollary, we consider the convergence of the empirical measure (of the particles output by VP-SVGD and GB-SVGD) to the target distribution and demonstrate a \emph{double exponential improvement} over the best known finite-particle analysis of SVGD.
翻译:斯坦因变分梯度下降(SVGD)是一种流行的变分推断算法,通过模拟相互作用粒子系统近似从目标分布采样,在多个领域展现出显著的实证性能。理论上其群体(即无穷粒子)极限动力学已被充分研究,但有限粒子状态下SVGD的行为仍鲜有理解。本文设计了两种计算高效的SVGD变体——VP-SVGD(概念优雅)与GB-SVGD(实证有效),并证明了其具有快速有限粒子收敛速率。我们引入"虚拟粒子"概念,在概率测度空间中对群体极限SVGD动力学开发了新颖的随机近似方法,该近似可通过有限粒子精确实现。所提算法可视为SVGD的特定随机批量近似,其计算效率优于常规SVGD。我们证明:在标准假设下,VP-SVGD与GB-SVGD以批量大小K运行T步后输出的n个粒子,至少等价于目标分布核斯坦因散度不超过O(d^{1/3}/(KT)^{1/6})的独立同分布样本。此外,我们的结果在势函数的温和增长条件下仍成立,该条件远弱于先前工作中普遍考虑的等周(如庞加莱不等式)或信息传输条件(如Talagrand不等式T_1)。作为推论,我们考察了(VP-SVGD与GB-SVGD输出粒子的)经验测度向目标分布的收敛性,并展示其相比SVGD最佳已知有限粒子分析实现了*双重指数改进*。