We provide the first convergence guarantee for full black-box variational inference (BBVI), also known as Monte Carlo variational inference. While preliminary investigations worked on simplified versions of BBVI (e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family. Also, our analysis reveals that certain algorithm design choices commonly employed in practice, particularly, nonlinear parameterizations of the scale of the variational approximation, can result in suboptimal convergence rates. Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations, and thus achieves the strongest known convergence rate guarantees. We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems.
翻译:本文首次为完整的黑箱变分推断(亦称蒙特卡洛变分推断)提供了收敛性保证。此前初步研究仅针对简化版本(如有界域、有界支撑、仅优化尺度参数等)展开分析,而我们的设定无需任何此类算法修改。对于具有/不具有强对数凹性的对数光滑后验密度及位置-尺度变分族,我们的结论均成立。此外,分析揭示:实践中常用的某些算法设计选择——特别是变分近似尺度的非线性参数化方案——可能导致次优收敛速率。幸运的是,采用近端随机梯度下降法执行黑箱变分推断可消除这些缺陷,从而获得已知最强的收敛速率保证。我们通过在大型贝叶斯推断问题上对比近端随机梯度下降法与其他标准黑箱变分推断实现,验证了这一理论洞见。