On the Convergence of the ELBO to Entropy Sums

The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For standard machine learning models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many well-known generative models. As concrete examples we discuss Sigmoid Belief Networks, probabilistic PCA and (Gaussian and non-Gaussian) mixture models. The results also apply for standard (Gaussian) variational autoencoders, which has been shown in parallel (Damm et al., 2023). The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.

翻译：变分下界（又称ELBO或自由能）是许多成熟及新兴无监督学习算法的核心优化目标。学习算法通过调整模型参数使变分下界增大，通常持续至参数收敛到学习动力学的驻点附近。在这项纯理论贡献中，我们证明（对于一大类生成模型）在所有学习驻点上，变分下界等于熵的总和。对于具有一组潜变量和一组观测变量的标准机器学习模型，该总和包含三个熵：（A）变分分布的（平均）熵，（B）模型先验分布的负熵，以及（C）观测分布的（期望）负熵。该结论适用于现实条件，包括：有限数据点、任意驻点（包括鞍点）以及任意（良态）变分分布族。证明此熵和等式的生成模型类别包含许多经典生成模型。具体实例包括Sigmoid信念网络、概率PCA以及（高斯与非高斯）混合模型。该结论同样适用于标准（高斯）变分自编码器，这一结果已在并行研究中得到验证（Damm等，2023）。我们用于证明熵和等式的先决条件相对宽松：给定生成模型的分布须属于指数族（具有恒定基测度），且模型需满足参数化准则（通常可满足）。在所述条件下证明ELBO在驻点上等于熵和，是本文的主要贡献。