Including a large number of predictors in the imputation model underlying a multiple imputation (MI) procedure is one of the most challenging tasks imputers face. A variety of high-dimensional MI techniques can help, but there has been limited research on their relative performance. In this study, we investigated a wide range of extant high-dimensional MI techniques that can handle a large number of predictors in the imputation models and general missing data patterns. We assessed the relative performance of seven high-dimensional MI methods with a Monte Carlo simulation study and a resampling study based on real survey data. The performance of the methods was defined by the degree to which they facilitate unbiased and confidencevalid estimates of the parameters of complete data analysis models. We found that using lasso penalty or forward selection to select the predictors used in the MI model and using principal component analysis to reduce the dimensionality of auxiliary data produce the best results.
翻译:在多重插补(MI)过程中,将大量预测因子纳入插补模型是插补者面临的最具挑战性的任务之一。多种高维MI技术可提供帮助,但关于其相对性能的研究有限。在本研究中,我们调查了一系列现有的高维MI技术,这些技术能够处理插补模型中的大量预测因子及一般缺失数据模式。我们通过蒙特卡洛模拟研究和基于真实调查数据的重抽样研究,评估了七种高维MI方法的相对性能。这些方法的性能由它们能够多大程度地促进完整数据分析模型参数的无偏和置信有效估计来定义。我们发现,使用套索惩罚或向前选择来筛选MI模型中使用的预测因子,以及使用主成分分析降低辅助数据的维度,能产生最佳结果。