Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, since unregularized maximum likelihood estimation cannot invert the data-generating process. Yet, VAEs often succeed at this task. We seek to elucidate this apparent paradox by studying nonlinear VAEs in the limit of near-deterministic decoders. We first prove that, in this regime, the optimal encoder approximately inverts the decoder -- a commonly used but unproven conjecture -- which we refer to as {\em self-consistency}. Leveraging self-consistency, we show that the ELBO converges to a regularized log-likelihood. This allows VAEs to perform what has recently been termed independent mechanism analysis (IMA): it adds an inductive bias towards decoders with column-orthogonal Jacobians, which helps recovering the true latent factors. The gap between ELBO and log-likelihood is therefore welcome, since it bears unanticipated benefits for nonlinear representation learning. In experiments on synthetic and image data, we show that VAEs uncover the true latent factors when the data generating process satisfies the IMA assumption.
翻译:变分自编码器(VAE)是建模复杂数据分布的流行框架;通过最大化证据下界(ELBO)进行变分推断可高效训练,但代价是与精确(对数)边际似然存在差距。尽管VAE常用于表示学习,但ELBO最大化为何能产生有用表示尚不清楚——因为未正则化的最大似然估计无法逆转数据生成过程。然而VAE在此任务中常获成功。我们通过研究近确定性解码器极限下的非线性VAE来阐明这一悖论。首先证明,在该极限下最优编码器近似逆解码器——这是一个广泛使用但未经证明的猜想——我们称之为自洽性。利用自洽性,我们证明ELBO收敛于正则化对数似然。这使得VAE能执行近期所称的独立机制分析(IMA):它向解码器引入列正交雅可比矩阵的归纳偏置,有助于恢复真实潜在因子。ELBO与对数似然之间的差距因此值得欢迎,因为它为非线性表示学习带来意想不到的益处。在合成数据和图像数据的实验中,我们证明当数据生成过程满足IMA假设时,VAE能揭示真实潜在因子。