Multivariate imputation by chained equations (MICE) is one of the most popular approaches to address missing values in a data set. This approach requires specifying a univariate imputation model for every variable under imputation. The specification of which predictors should be included in these univariate imputation models can be a daunting task. Principal component analysis (PCA) can simplify this process by replacing all of the potential imputation model predictors with a few components summarizing their variance. In this article, we extend the use of PCA with MICE to include a supervised aspect whereby information from the variables under imputation is incorporated into the principal component estimation. We conducted an extensive simulation study to assess the statistical properties of MICE with different versions of supervised dimensionality reduction and we compared them with the use of classical unsupervised PCA as a simpler dimensionality reduction technique.
翻译:链式方程多重插补是处理数据集中缺失值最常用的方法之一。该方法需要为每个被插补变量指定单变量插补模型,而确定应纳入这些单变量插补模型的预测变量通常是一项艰巨任务。主成分分析可通过用少量概括变量方差的成分替代所有潜在插补模型预测变量来简化此过程。本文扩展了主成分分析与链式方程多重插补的联合使用,引入监督机制,即将被插补变量的信息纳入主成分估计中。我们通过大规模模拟研究评估了不同版本监督降维与链式方程多重插补结合的统计性能,并将其与经典无监督主成分分析这一更简单的降维技术进行对比。