Stochastic natural gradient variational inference (NGVI) is a popular and efficient algorithm for Bayesian inference. Despite empirical success, the convergence of this method is still not fully understood. In this work, we define and study a projected stochastic NGVI when variational distributions form an exponential family. Stochasticity arises when either gradients are intractable expectations or large sums. We prove new non-asymptotic convergence results for combinations of constant or decreasing step sizes and constant or increasing sample/batch sizes. When all hyperparameters are fixed, NGVI is shown to converge geometrically to a neighborhood of the optimum, while we establish convergence to the optimum with rates of the form $\mathcal{O}\left(\frac{1}{T^ρ} \right)$, possibly with $ρ\geq 1$, for all other combinations of step size and sample/batch size schedules. These rates apply when the target posterior distribution is close in some sense to the considered exponential family. Our theoretical results extend existing NGVI and stochastic optimization results and provide more flexibility to adjust, in a principled way, step sizes and sample/batch sizes in order to meet speed, resources, or accuracy constraints.
翻译:随机自然梯度变分推断(NGVI)是一种流行且高效的贝叶斯推断算法。尽管经验上取得了成功,但该方法的收敛性仍未完全明晰。本文针对变分分布构成指数族的情形,定义并研究了一种投影随机NGVI方法。随机性源于梯度为难以计算的期望或大规模求和。我们证明了在恒定/递减步长与恒定/递增样本/批次大小的组合下,新的非渐近收敛结果:当所有超参数固定时,NGVI几何收敛至最优解邻域;而对于步长与样本/批次大小调度的其他组合,则建立了形如$\mathcal{O}\left(\frac{1}{T^ρ} \right)$(可能存在$ρ\geq 1$)的收敛速率,这些速率适用于目标后验分布在某种意义下接近所考虑指数族的情形。本文的理论结果扩展了现有NGVI与随机优化的结论,为根据速度、资源或精度约束原则性调整步长与样本/批次大小提供了更强的灵活性。