The Shrinkage-Delinkage Trade-off: An Analysis of Factorized Gaussian Approximations for Variational Inference

When factorized approximations are used for variational inference (VI), they tend to understimate the uncertainty -- as measured in various ways -- of the distributions they are meant to approximate. We consider two popular ways to measure the uncertainty deficit of VI: (i) the degree to which it underestimates the componentwise variance, and (ii) the degree to which it underestimates the entropy. To better understand these effects, and the relationship between them, we examine an informative setting where they can be explicitly (and elegantly) analyzed: the approximation of a Gaussian,~$p$, with a dense covariance matrix, by a Gaussian,~$q$, with a diagonal covariance matrix. We prove that $q$ always underestimates both the componentwise variance and the entropy of $p$, \textit{though not necessarily to the same degree}. Moreover we demonstrate that the entropy of $q$ is determined by the trade-off of two competing forces: it is decreased by the shrinkage of its componentwise variances (our first measure of uncertainty) but it is increased by the factorized approximation which delinks the nodes in the graphical model of $p$. We study various manifestations of this trade-off, notably one where, as the dimension of the problem grows, the per-component entropy gap between $p$ and $q$ becomes vanishingly small even though $q$ underestimates every componentwise variance by a constant multiplicative factor. We also use the shrinkage-delinkage trade-off to bound the entropy gap in terms of the problem dimension and the condition number of the correlation matrix of $p$. Finally we present empirical results on both Gaussian and non-Gaussian targets, the former to validate our analysis and the latter to explore its limitations.

翻译：当使用因子化近似进行变分推断时，这类方法往往会低估其目标分布的不确定性——这一现象可通过多种方式度量。我们考虑两种常见的变分推断不确定性缺陷度量方式：（i）对分量方差的低估程度，以及（ii）对熵的低估程度。为深入理解这些效应及其关联，我们考察了一个可进行显式（且优雅）分析的信息性场景：用具有对角协方差矩阵的高斯分布~$q$~来逼近具有稠密协方差矩阵的高斯分布~$p$~。我们证明，$q$~始终低估$p$~的分量方差和熵，\textit{但两者低估程度未必相同}。进一步，我们证明$q$~的熵取决于两种竞争力量的权衡：分量方差的收缩（我们的第一个不确定性度量）会降低熵值，而因子化近似对$p$~图模型中节点进行解耦则会增加熵值。我们考察了这一权衡的多种表现形式，特别揭示出：随着问题维度增加，即使$q$~以恒定倍数低估每个分量方差，$p$~与$q$~的每分量熵差也会趋于零。此外，我们利用收缩-解耦权衡将熵差界表示为问题维度和$p$~相关矩阵条件数的函数。最后，我们在高斯和非高斯目标分布上展示实证结果——前者验证理论分析，后者探讨方法局限性。