Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD projected to the family of Gaussian distributions via the bilinear kernel, or equivalently Gaussian variational inference (GVI) with SVGD. We present a complete picture by considering both the mean-field PDE and discrete particle systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD dynamics is proven to converge linearly to the Gaussian distribution closest to the target in KL divergence. In the finite-particle setting, there is both uniform in time convergence to the mean-field limit and linear convergence in time to the equilibrium if the target is Gaussian. In the general case, we propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework. Interestingly, one of the new particle-based instance from this framework empirically outperforms existing approaches. Our results make concrete contributions towards obtaining a deeper understanding of both SVGD and GVI.
翻译:斯坦变分梯度下降(SVGD)是一种基于粒子的非参数确定性采样算法。尽管应用广泛,但理解SVGD的理论性质仍是一个具有挑战性的问题。对于从高斯目标分布采样的情况,若双线性核的SVGD动力学初始分布为高斯分布,其演化过程将保持高斯性。受此启发,我们对高斯-SVGD(即通过双线性核将SVGD投影到高斯分布族,等价于采用SVGD的高斯变分推断)进行了详细的理论研究。通过同时考虑平均场偏微分方程与离散粒子系统,我们呈现了完整的图景。当目标分布为强对数凹时,平均场高斯-SVGD动力学被证明线性收敛于与目标KL散度最接近的高斯分布。在有限粒子设置中,若目标分布为高斯分布,则存在时间一致收敛到平均场极限以及随时间线性收敛到平衡态的结果。针对一般情况,我们提出了基于密度和基于粒子的高斯-SVGD实现方案,并表明最近从不同角度提出的几种高斯变分推断算法均可视为我们统一框架的特例。有趣的是,该框架中一种新的基于粒子实例在实际应用中优于现有方法。我们的研究结果对深入理解SVGD和高斯变分推断做出了具体贡献。