Stochastic natural gradient variational inference (NGVI) is a popular posterior inference method with applications in various probabilistic models. Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding. For conjugate likelihoods, we prove the first $\mathcal{O}(\frac{1}{T})$ non-asymptotic convergence rate of stochastic NGVI. The complexity is no worse than stochastic gradient descent (\aka black-box variational inference) and the rate likely has better constant dependency that leads to faster convergence in practice. For non-conjugate likelihoods, we show that stochastic NGVI with the canonical parameterization implicitly optimizes a non-convex objective. Thus, a global convergence rate of $\mathcal{O}(\frac{1}{T})$ is unlikely without some significant new understanding of optimizing the ELBO using natural gradients.
翻译:随机自然梯度变分推断(NGVI)是一种流行的后验推断方法,广泛应用于各类概率模型。尽管其应用广泛,但人们对其在**随机**设定下的非渐近收敛速率知之甚少。我们的目标是缩小这一差距并提供更深入的理解。对于共轭似然,我们首次证明了随机NGVI具有$\mathcal{O}(\frac{1}{T})$的非渐近收敛速率。其复杂度不劣于随机梯度下降(亦称黑盒变分推断),且该速率很可能具有更优的常数依赖关系,从而在实践中带来更快的收敛。对于非共轭似然,我们证明了采用典型参数化的随机NGVI隐式地优化了一个非凸目标。因此,若没有关于使用自然梯度优化ELBO的某些重要新理解,则不太可能获得$\mathcal{O}(\frac{1}{T})$的全局收敛速率。