In this paper, we propose a novel framework, the Sampling-guided Heterogeneous Graph Neural Network (SHT-GNN), to effectively tackle the challenge of missing data imputation in longitudinal studies. Unlike traditional methods, which often require extensive preprocessing to handle irregular or inconsistent missing data, our approach accommodates arbitrary missing data patterns while maintaining computational efficiency. SHT-GNN models both observations and covariates as distinct node types, connecting observation nodes at successive time points through subject-specific longitudinal subnetworks, while covariate-observation interactions are represented by attributed edges within bipartite graphs. By leveraging subject-wise mini-batch sampling and a multi-layer temporal smoothing mechanism, SHT-GNN efficiently scales to large datasets, while effectively learning node representations and imputing missing data. Extensive experiments on both synthetic and real-world datasets, including the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, demonstrate that SHT-GNN significantly outperforms existing imputation methods, even with high missing data rates. The empirical results highlight SHT-GNN's robust imputation capabilities and superior performance, particularly in the context of complex, large-scale longitudinal data.
翻译:本文提出了一种新颖的框架——采样引导的异质图神经网络(SHT-GNN),以有效应对纵向研究中缺失数据插补的挑战。与传统方法通常需要大量预处理来处理不规则或不一致的缺失数据不同,我们的方法能够适应任意的缺失数据模式,同时保持计算效率。SHT-GNN将观测值和协变量建模为不同的节点类型,通过个体特定的纵向子网络连接连续时间点的观测节点,而协变量-观测值之间的交互则通过二分图中的属性边来表示。通过利用个体层面的小批量采样和多层时序平滑机制,SHT-GNN能够高效扩展到大规模数据集,同时有效地学习节点表示并插补缺失数据。在合成数据集和真实世界数据集(包括阿尔茨海默病神经影像学倡议(ADNI)数据集)上进行的大量实验表明,即使在高缺失率的情况下,SHT-GNN也显著优于现有的插补方法。实证结果突显了SHT-GNN强大的插补能力和优越性能,特别是在处理复杂、大规模的纵向数据时。