Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology, but real-world datasets present notable challenges for practitioners because they can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown stochastic process. While various solutions have been suggested, the majority of them have been designed to account for only one of these challenges. In this work, we propose a flexible and efficient latent-variable model that is capable of addressing all these limitations. Our approach utilizes Gaussian processes to capture temporal correlations between samples and their associated missingness masks as well as to model the underlying point process. We construct our model as a variational autoencoder together with deep neural network parameterised encoder and decoder models, and develop a scalable amortised variational inference approach for efficient model training. We demonstrate competitive performance using both simulated and real datasets.
翻译:纵向数据在医疗、社会学和地震学等多个领域具有重要应用价值,但真实数据集中普遍存在高维性、结构化缺失模式以及测量时间点受未知随机过程支配等复杂特征,给研究者带来显著挑战。尽管已有多种解决方案被提出,但多数方法仅针对单一问题设计。本研究提出一种兼具灵活性与高效性的潜变量模型,可同时应对上述所有局限性。该模型采用高斯过程捕获样本间的时序相关性及其关联的缺失掩码,并对底层点过程进行建模。我们通过变分自编码器框架构建模型,搭配深度神经网络参数化的编码器和解码器,开发出可扩展的摊销变分推断方法以实现高效模型训练。在模拟数据集和真实数据集上的实验均验证了该方法的优越性能。