Gaussian variational inference and the Laplace approximation are popular alternatives to Markov chain Monte Carlo that formulate Bayesian posterior inference as an optimization problem, enabling the use of simple and scalable stochastic optimization algorithms. However, a key limitation of both methods is that the solution to the optimization problem is typically not tractable to compute; even in simple settings the problem is nonconvex. Thus, recently developed statistical guarantees -- which all involve the (data) asymptotic properties of the global optimum -- are not reliably obtained in practice. In this work, we provide two major contributions: a theoretical analysis of the asymptotic convexity properties of variational inference with a Gaussian family and the maximum a posteriori (MAP) problem required by the Laplace approximation; and two algorithms -- consistent Laplace approximation (CLA) and consistent stochastic variational inference (CSVI) -- that exploit these properties to find the optimal approximation in the asymptotic regime. Both CLA and CSVI involve a tractable initialization procedure that finds the local basin of the optimum, and CSVI further includes a scaled gradient descent algorithm that provably stays locally confined to that basin. Experiments on nonconvex synthetic and real-data examples show that compared with standard variational and Laplace approximations, both CSVI and CLA improve the likelihood of obtaining the global optimum of their respective optimization problems.
翻译:高斯变分推断与拉普拉斯逼近是马尔可夫链蒙特卡洛方法的流行替代方案,它们将贝叶斯后验推断表述为优化问题,从而能够使用简单且可扩展的随机优化算法。然而,这两种方法的关键局限性在于优化问题的解通常难以计算——即使在简单场景中,该问题也是非凸的。因此,近期建立的统计保证(均涉及全局最优解的数据渐近性质)在实践中难以可靠获得。本文做出两项主要贡献:首先从理论上分析了高斯族变分推断及拉普拉斯逼近所需的最大后验估计问题的渐近凸性性质;其次提出了两种算法——一致性拉普拉斯逼近(CLA)与一致性随机变分推断(CSVI)——利用这些性质在渐近区间内寻找最优逼近。CLA与CSVI均采用可计算的初始化过程定位最优解的局部吸引域,其中CSVI还包含一个经理论证明能严格保持在该吸引域内的缩放梯度下降算法。在非凸合成数据与真实数据上的实验表明,相较标准变分推断与拉普拉斯逼近方法,CSVI与CLA均提高了各自优化问题获得全局最优解的可能性。