Missingness is a common issue for neuroimaging data, and neglecting it in downstream statistical analysis can introduce bias and lead to misguided inferential conclusions. It is therefore crucial to conduct appropriate statistical methods to address this issue. While multiple imputation is a popular technique for handling missing data, its application to neuroimaging data is hindered by high dimensionality and complex dependence structures of multivariate neuroimaging variables. To tackle this challenge, we propose a novel approach, named High Dimensional Multiple Imputation (HIMA), based on Bayesian models. HIMA develops a new computational strategy for sampling large covariance matrices based on a robustly estimated posterior mode, which drastically enhances computational efficiency and numerical stability. To assess the effectiveness of HIMA, we conducted extensive simulation studies and real-data analysis using neuroimaging data from a Schizophrenia study. HIMA showcases a computational efficiency improvement of over 2000 times when compared to traditional approaches, while also producing imputed datasets with improved precision and stability.
翻译:缺失性是神经影像数据中常见的问题,若在下游统计分析中忽视此问题,可能引入偏倚并导致错误的推断结论。因此,采用恰当的统计方法来处理缺失性至关重要。尽管多重插补是处理缺失数据的常用技术,但其在神经影像数据中的应用受限于高维性和多变量神经影像变量的复杂依赖结构。为应对这一挑战,我们提出了一种基于贝叶斯模型的新方法——高维多重插补(HIMA)。HIMA开发了一种基于稳健后验众数估计来采样大型协方差矩阵的新计算策略,显著提升了计算效率与数值稳定性。为评估HIMA的性能,我们利用一项精神分裂症研究的神经影像数据开展了广泛的模拟研究和真实数据分析。与传统方法相比,HIMA的计算效率提升超过2000倍,同时生成的插补数据集具有更高的精度和稳定性。