Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD projected to the family of Gaussian distributions via the bilinear kernel, or equivalently Gaussian variational inference (GVI) with SVGD. We present a complete picture by considering both the mean-field PDE and discrete particle systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD dynamics is proven to converge linearly to the Gaussian distribution closest to the target in KL divergence. In the finite-particle setting, there is both uniform in time convergence to the mean-field limit and linear convergence in time to the equilibrium if the target is Gaussian. In the general case, we propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework. Interestingly, one of the new particle-based instance from this framework empirically outperforms existing approaches. Our results make concrete contributions towards obtaining a deeper understanding of both SVGD and GVI.
翻译:斯坦变分梯度下降(SVGD)是一种基于非参数粒子的确定性采样算法。尽管其应用广泛,理解SVGD的理论性质仍然是一个具有挑战性的问题。对于从高斯目标分布采样,当初始分布为高斯分布时,使用双线性核的SVGD动力学将保持高斯性。受此启发,我们对高斯-SVGD(即通过双线性核投影到高斯分布族的SVGD,等价于使用SVGD的高斯变分推断(GVI))进行了详细的理论研究。我们通过同时考虑均场偏微分方程和离散粒子系统,呈现了完整的图景。当目标分布为强对数凹时,均场高斯-SVGD动力学被证明会线性收敛至KL散度下最接近目标分布的高斯分布。在有限粒子设置中,若目标分布为高斯分布,则存在对均场极限的时间一致收敛以及随时间线性收敛至平衡态。在一般情形下,我们提出了基于密度和基于粒子的高斯-SVGD实现方法,并表明近期从不同视角提出的几种GVI算法均是我们统一框架的特例。有趣的是,该框架中一个基于粒子的新实例在实验性能上优于现有方法。我们的研究结果为深入理解SVGD和GVI做出了具体贡献。