On the Convergence of the ELBO to Entropy Sums

The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For standard machine learning models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many well-known generative models. As concrete examples we discuss Sigmoid Belief Networks, probabilistic PCA and (Gaussian and non-Gaussian) mixture models. The results also apply for standard (Gaussian) variational autoencoders, which has been shown in parallel (Damm et al., 2023). The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.

翻译：变分下界（即ELBO或自由能）是许多既有及新型无监督学习算法的核心目标函数。学习算法通过调整模型参数使变分下界增大，通常持续至参数收敛至学习动力学的平稳点附近。本文纯理论贡献在于证明：（对于极广一类生成模型）所有学习平稳点处的变分下界均等于熵之和。对于含一组隐变量和一组观测变量的标准机器学习模型，该和由三项熵构成：（A）变分分布的（平均）熵，（B）模型先验分布的负熵，及（C）可观测分布的（期望）负熵。所得结论适用于现实条件，包括：有限数据点、任意平稳点（含鞍点）及任意（性质良好的）变分分布族。实现该熵之和等式的生成模型类别包含众多经典模型。具体实例包括Sigmoid信念网络、概率主成分分析（PCA）及（高斯与非高斯）混合模型。该结论同样适用于标准（高斯）变分自编码器（Damm等人，2023年已并行证明）。建立熵之和等式的前置条件相对宽松：生成模型的分布需属于指数族（具有恒定基测度），且模型需满足参数化准则（通常成立）。证明在所述条件下ELBO在平稳点处等于熵之和，是本文的核心贡献。