We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.
翻译:本文研究训练数据不完整时变分自编码器(VAEs)的估计问题。我们证明,与完全观测情况相比,数据缺失会增大模型在隐变量上后验分布的复杂度。这种复杂度的提升可能因变分分布与模型后验分布之间的失配而对模型拟合产生不利影响。我们提出两种基于(i)有限变分混合分布与(ii)基于插值的变分混合分布的解决策略,以应对后验复杂度的增加。通过对所提方法的全面评估,我们证明变分混合分布能有效提升从不完整数据中估计VAE的准确性。