The Variational Autoencoder (VAE) is known to suffer from the phenomenon of \textit{posterior collapse}, where the latent representations generated by the model become independent of the inputs. This leads to degenerated representations of the input, which is attributed to the limitations of the VAE's objective function. In this work, we propose a novel solution to this issue, the Contrastive Regularization for Variational Autoencoders (CR-VAE). The core of our approach is to augment the original VAE with a contrastive objective that maximizes the mutual information between the representations of similar visual inputs. This strategy ensures that the information flow between the input and its latent representation is maximized, effectively avoiding posterior collapse. We evaluate our method on a series of visual datasets and demonstrate, that CR-VAE outperforms state-of-the-art approaches in preventing posterior collapse.
翻译:变分自编码器(VAE)存在一种称为“后验坍塌”的现象,即模型生成的潜在表示与输入变得相互独立。这导致输入的表示退化,其根源在于VAE目标函数的局限性。本文针对此问题提出了一种新颖的解决方案——对比正则化变分自编码器(CR-VAE)。该方法的核心在于为原始VAE增加一个对比学习目标,以最大化相似视觉输入表示之间的互信息。此策略确保了输入与其潜在表示之间的信息流最大化,从而有效避免了后验坍塌。我们在多个视觉数据集上评估了该方法,结果表明CR-VAE在防止后验坍塌方面优于现有最先进方法。