Since their inception, Variational Autoencoders (VAEs) have become central in machine learning. Despite their widespread use, numerous questions regarding their theoretical properties remain open. Using PAC-Bayesian theory, this work develops statistical guarantees for VAEs. First, we derive the first PAC-Bayesian bound for posterior distributions conditioned on individual samples from the data-generating distribution. Then, we utilize this result to develop generalization guarantees for the VAE's reconstruction loss, as well as upper bounds on the distance between the input and the regenerated distributions. More importantly, we provide upper bounds on the Wasserstein distance between the input distribution and the distribution defined by the VAE's generative model.
翻译:自提出以来,变分自编码器(VAEs)已成为机器学习领域的核心模型。尽管应用广泛,但关于其理论性质的诸多问题仍悬而未决。本研究基于PAC-贝叶斯理论,为VAEs建立了统计保证。首先,我们推导出首个针对特定于数据生成分布中单个样本的后验分布的PAC-贝叶斯界。随后,利用该结果给出了VAE重建损失的泛化保证,以及输入分布与再生分布之间距离的上界。更重要的是,我们提供了输入分布与VAE生成模型定义的分布之间的Wasserstein距离上界。