Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology, but real-world datasets present notable challenges for practitioners because they can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown stochastic process. While various solutions have been suggested, the majority of them have been designed to account for only one of these challenges. In this work, we propose a flexible and efficient latent-variable model that is capable of addressing all these limitations. Our approach utilizes Gaussian processes to capture temporal correlations between samples and their associated missingness masks as well as to model the underlying point process. We construct our model as a variational autoencoder together with deep neural network parameterised encoder and decoder models, and develop a scalable amortised variational inference approach for efficient model training. We demonstrate competitive performance using both simulated and real datasets.
翻译:纵向数据在医疗保健、社会学和地震学等诸多领域具有重要意义,但现实世界的数据集给实践者带来了显著挑战,因为它们可能具有高维度、包含结构化缺失模式,并且测量时间点可能受未知随机过程支配。尽管已有多种解决方案被提出,但其中大多数仅针对上述挑战之一而设计。在本工作中,我们提出了一种灵活高效的潜在变量模型,能够同时应对所有这些限制。我们的方法利用高斯过程来捕捉样本及其相关缺失掩码之间的时间相关性,并对基础点过程进行建模。我们将模型构建为变分自编码器,配合深度神经网络参数化的编码器和解码器模型,并开发了一种可扩展的摊销变分推理方法以实现高效模型训练。我们通过模拟和真实数据集展示了具有竞争力的性能。