We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations and the $N$-fold product target measure, starting from a regular initial distribution, splits into a dominant `negative part' proportional to $N$ times the expected $\mathsf{KSD}^2$ and a smaller `positive part'. This observation leads to $\mathsf{KSD}$ rates of order $1/\sqrt{N}$, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension $d$. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of `bilinear + Mat\'ern' kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws.
翻译:我们为Stein变分梯度下降(SVGD)算法在核化Stein差异($\mathsf{KSD}$)和Wasserstein-2度量下提供了有限粒子收敛率。我们的核心洞见是:从正则初始分布出发,$N$个粒子位置联合密度与$N$重乘积目标测度之间相对熵的时间导数,可分解为主导的"负部"(与$N$倍期望$\mathsf{KSD}^2$成正比)和较小的"正部"。这一观测导出了连续与离散时间下量级为$1/\sqrt{N}$的$\mathsf{KSD}$收敛率,相较于Shi和Mackey(2024)的最新结果实现了近最优(在匹配对应独立同分布率的意义上)的双指数改进。在核函数与势函数的温和假设下,这些界随维度$d$呈多项式增长。通过向核函数添加双线性分量,上述方法可进一步获得连续时间下的Wasserstein-2收敛。对于"双线性+Mat\'ern"核函数情形,我们推导出的Wasserstein-2收敛率呈现出与独立同分布设定类似的维度诅咒现象。我们还获得了时间平均粒子律的边际收敛与长时间混沌传播结果。