We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.
翻译:我们考虑在训练数据不完整时估计变分自编码器(VAE)的任务。研究表明,与完全观测情况相比,缺失数据会增加模型在潜变量上后验分布的复杂度。这种增大的复杂度可能因变分分布与模型后验分布之间的失配而对模型拟合产生不利影响。我们引入两种策略来解决后验复杂度增加的问题:(i)有限变分混合分布和(ii)基于插补的变分混合分布。通过对所提方法的综合评估,我们证明变分混合分布能有效提高从不完全数据中估计VAE的准确性。