Amortized variational inference (A-VI) is a method for approximating the intractable posterior distributions that arise in probabilistic models. The defining feature of A-VI is that it learns a global inference function that maps each observation to its local latent variable's approximate posterior. This stands in contrast to the more classical factorized (or mean-field) variational inference (F-VI), which directly learns the parameters of the approximating distribution for each latent variable. In deep generative models, A-VI is used as a computational trick to speed up inference for local latent variables. In this paper, we study A-VI as a general alternative to F-VI for approximate posterior inference. A-VI cannot produce an approximation with a lower Kullback-Leibler divergence than F-VI's optimal solution, because the amortized family is a subset of the factorized family. Thus a central theoretical problem is to characterize when A-VI still attains F-VI's optimal solution. We derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum. We show that for a broad class of hierarchical models, including deep generative models, it is possible to close the gap between A-VI and F-VI. Further, for an even broader class of models, we establish when and how to expand the domain of the inference function to make amortization a feasible strategy. Finally, we prove that for certain models -- including hidden Markov models and Gaussian processes -- A-VI cannot match F-VI's solution, no matter how expressive the inference function is. We also study A-VI empirically [...]
翻译:摊销变分推断(A-VI)是一种用于近似概率模型中难以处理的后验分布的方法。A-VI的显著特点是学习一个全局推断函数,将每个观测值映射到其局部潜变量的近似后验。这与更经典的因子化(或平均场)变分推断(F-VI)形成对比,后者直接学习每个潜变量近似分布的参数。在深度生成模型中,A-VI被用作加速局部潜变量推断的计算技巧。本文研究A-VI作为F-VI在近似后验推断中的通用替代方案。由于摊销族是因子化族的子集,A-VI无法产生比F-VI最优解更低的Kullback-Leibler散度近似。因此,一个核心理论问题在于刻画A-VI何时仍能达到F-VI的最优解。我们推导了模型和推断函数在理论上可使A-VI达到F-VI最优值的条件。研究表明,对于包括深度生成模型在内的广泛层次模型,弥合A-VI与F-VI之间的差距是可能的。此外,针对更广泛的模型类别,我们确定了何时以及如何扩展推断函数的定义域,以使摊销成为可行策略。最后,我们证明对于某些模型(包括隐马尔可夫模型和高斯过程),无论推断函数多么强大,A-VI都无法匹配F-VI的解。我们还对A-VI进行了实证研究[...]