Since their inception, Variational Autoencoders (VAEs) have become central in machine learning. Despite their widespread use, numerous questions regarding their theoretical properties remain open. Using PAC-Bayesian theory, this work develops statistical guarantees for VAEs. First, we derive the first PAC-Bayesian bound for posterior distributions conditioned on individual samples from the data-generating distribution. Then, we utilize this result to develop generalization guarantees for the VAE's reconstruction loss, as well as upper bounds on the distance between the input and the regenerated distributions. More importantly, we provide upper bounds on the Wasserstein distance between the input distribution and the distribution defined by the VAE's generative model.
翻译:自变分自编码器(VAEs)问世以来,它们已成为机器学习领域的核心模型。尽管应用广泛,但关于其理论性质的诸多问题仍有待解决。本研究利用PAC-Bayesian理论为VAEs建立了统计保证。首先,我们推导出首个针对给定数据生成分布中单个样本所对应后验分布的PAC-Bayesian界。随后,我们利用该结果发展了VAE重构损失的泛化保证,以及输入分布与再生分布之间距离的上界。更重要的是,我们给出了输入分布与VAE生成模型所定义分布之间的Wasserstein距离上界。