We consider the problem of learning Variational Autoencoders (VAEs), i.e., a type of deep generative model, from data with missing values. Such data is omnipresent in real-world applications of machine learning because complete data is often impossible or too costly to obtain. We particularly focus on improving a VAE's amortized posterior inference, i.e., the encoder, which in the case of missing data can be susceptible to learning inconsistent posterior distributions regarding the missingness. To this end, we provide a formal definition of posterior consistency and propose an approach for regularizing an encoder's posterior distribution which promotes this consistency. We observe that the proposed regularization suggests a different training objective than that typically considered in the literature when facing missing values. Furthermore, we empirically demonstrate that our regularization leads to improved performance in missing value settings in terms of reconstruction quality and downstream tasks utilizing uncertainty in the latent space. This improved performance can be observed for many classes of VAEs including VAEs equipped with normalizing flows.
翻译:我们考虑从含缺失值的数据中学习变分自编码器(VAEs)——一类深度生成模型——的问题。由于完整数据在实际机器学习应用中往往难以获取或获取成本过高,此类数据普遍存在。我们特别关注改进变分自编码器的摊销后验推断(即编码器),因为当数据存在缺失时,编码器可能容易学习到关于缺失机制的不一致后验分布。为此,我们给出了后验一致性的正式定义,并提出了一种正则化编码器后验分布以促进一致性的方法。我们观察到,所提出的正则化方法在应对缺失值时,所采用的训练目标与文献中通常考虑的目标有所不同。此外,我们通过实验证明,该正则化方法在缺失值设置下能提升重构质量及利用潜空间不确定性的下游任务性能。这一性能改进可在一系列变分自编码器(包括配备归一化流的VAEs)中观察到。