Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD projected to the family of Gaussian distributions via the bilinear kernel, or equivalently Gaussian variational inference (GVI) with SVGD. We present a complete picture by considering both the mean-field PDE and discrete particle systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD dynamics is proven to converge linearly to the Gaussian distribution closest to the target in KL divergence. In the finite-particle setting, there is both uniform in time convergence to the mean-field limit and linear convergence in time to the equilibrium if the target is Gaussian. In the general case, we propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework. Interestingly, one of the new particle-based instance from this framework empirically outperforms existing approaches. Our results make concrete contributions towards obtaining a deeper understanding of both SVGD and GVI.
翻译:斯坦变分梯度下降(SVGD)是一种非参数化的基于粒子的确定性采样算法。尽管其应用广泛,理解SVGD的理论性质仍是一个具有挑战性的问题。当目标分布为高斯分布时,使用双线性核的SVGD动力学在初始分布为高斯的情况下将保持为高斯分布。受此启发,我们对高斯-SVGD(即通过双线性核将SVGD投影到高斯分布族,等价于使用SVGD的高斯变分推断)进行了详细的理论研究。通过考虑均场偏微分方程和离散粒子系统,我们呈现了一幅完整的图像。当目标分布为强对数凹时,均场高斯-SVGD动力学被证明线性收敛到与目标分布KL散度最近的高斯分布。在有限粒子设定中,若目标为高斯分布,则存在时间均匀收敛到均场极限以及随时间线性收敛到平衡态。在一般情况下,我们提出了基于密度和基于粒子的两种高斯-SVGD实现,并表明近期从不同角度提出的几种高斯变分推断算法均属于我们统一框架的特例。有趣的是,该框架中一种新型的基于粒子的实例在经验上优于现有方法。我们的结果为深入理解SVGD和高斯变分推断做出了具体贡献。