As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming more and more popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is still largely lacking. In this paper, we consider the common coordinate ascent variational inference (CAVI) algorithm for implementing the mean-field (MF) VI towards optimizing a Kullback--Leibler divergence objective functional over the space of all factorized distributions. Focusing on the two-block case, we analyze the convergence of CAVI by leveraging the extensive toolbox from functional analysis and optimization. We provide general conditions for certifying global or local exponential convergence of CAVI. Specifically, a new notion of generalized correlation for characterizing the interaction between the constituting blocks in influencing the VI objective functional is introduced, which according to the theory, quantifies the algorithmic contraction rate of two-block CAVI. As illustrations, we apply the developed theory to a number of examples, and derive explicit problem-dependent upper bounds on the algorithmic contraction rate.
翻译:作为马尔可夫链蒙特卡洛方法的计算替代方案,变分推断(VI)由于兼具相当的效能与卓越的效率,在大规模贝叶斯模型中近似难以处理的 posterior 分布方面日益流行。近期多项研究通过证明 VI 在各种设定下参数估计的统计最优性,为其提供了理论依据;然而,关于 VI 算法收敛性的形式化分析仍较为匮乏。本文考虑用于实现平均场(MF)VI 的常用坐标上升变分推断(CAVI)算法,该算法旨在所有因子化分布空间上优化 Kullback-Leibler 散度目标泛函。聚焦于两区块情形,我们利用泛函分析与优化的丰富工具集分析 CAVI 的收敛性。本文给出了保证 CAVI 全局或局部指数收敛的一般性条件。具体而言,我们引入了一种用于刻画构成区块间相互作用如何影响 VI 目标泛函的新概念——广义相关性。根据理论,该概念量化了两区块 CAVI 的算法收缩率。作为示例,我们将所建立的理论应用于多个具体案例,并推导出算法收缩率的显式问题依赖性上界。