The variational autoencoder (VAE) is a popular deep latent variable model used to analyse high-dimensional datasets by learning a low-dimensional latent representation of the data. It simultaneously learns a generative model and an inference network to perform approximate posterior inference. Recently proposed extensions to VAEs that can handle temporal and longitudinal data have applications in healthcare, behavioural modelling, and predictive maintenance. However, these extensions do not account for heterogeneous data (i.e., data comprising of continuous and discrete attributes), which is common in many real-life applications. In this work, we propose the heterogeneous longitudinal VAE (HL-VAE) that extends the existing temporal and longitudinal VAEs to heterogeneous data. HL-VAE provides efficient inference for high-dimensional datasets and includes likelihood models for continuous, count, categorical, and ordinal data while accounting for missing observations. We demonstrate our model's efficacy through simulated as well as clinical datasets, and show that our proposed model achieves competitive performance in missing value imputation and predictive accuracy.
翻译:变分自编码器(VAE)是一种流行的深度潜变量模型,通过学习数据的低维潜表征来分析高维数据集。它同时学习生成模型和推理网络以执行近似后验推理。近年来提出的能够处理时间序列和纵向数据的VAE扩展在医疗保健、行为建模和预测性维护等领域具有应用价值。然而,这些扩展未考虑异质数据(即包含连续属性和离散属性的数据),而此类数据在许多实际应用中普遍存在。本研究提出异质纵向VAE(HL-VAE),将现有时间序列与纵向VAE扩展至异质数据场景。HL-VAE为高维数据集提供高效推理,包含适用于连续、计数、分类和有序数据的似然模型,并能处理缺失观测值。我们通过模拟数据集和临床数据集验证了模型有效性,结果表明所提模型在缺失值插补和预测准确性方面均具有竞争力。