We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $\Theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.
翻译:我们证明了采用控制变量法的黑箱变分推断,特别是平稳着陆(STL)估计量,在变分族完美指定条件下以几何速率(传统上称为“线性”速率)收敛。具体而言,我们给出了STL估计量梯度方差的二次上界,该上界同时涵盖了错误指定的变分族。结合先前关于二次方差条件的研究,这直接证明了使用投影随机梯度下降法的黑箱变分推断的收敛性。对于投影算子,我们考虑采用三角尺度矩阵的域,该投影可在 $\Theta(d)$ 时间内计算,其中 $d$ 为目标后验的维度。我们还改进了现有对常规闭式熵梯度估计量的分析,使其能与STL估计量进行对比,并分别为两者提供明确的非渐近复杂度保证。