Multivariate imputation by chained equations (MICE) is one of the most popular approaches to address missing values in a data set. This approach requires specifying a univariate imputation model for every variable under imputation. The specification of which predictors should be included in these univariate imputation models can be a daunting task. Principal component analysis (PCA) can simplify this process by replacing all of the potential imputation model predictors with a few components summarizing their variance. In this article, we extend the use of PCA with MICE to include a supervised aspect whereby information from the variables under imputation is incorporated into the principal component estimation. We conducted an extensive simulation study to assess the statistical properties of MICE with different versions of supervised dimensionality reduction and we compared them with the use of classical unsupervised PCA as a simpler dimensionality reduction technique.
翻译:链式方程多元插补(MICE)是处理数据集中缺失值最常用的方法之一。该方法需要为每个待插补变量指定一个单变量插补模型。确定哪些预测变量应纳入这些单变量插补模型可能是一项艰巨的任务。主成分分析(PCA)可通过用少数几个概括变量方差的主成分替代所有潜在插补模型预测变量,简化这一流程。本文在MICE中扩展了PCA的应用,引入了监督性维度——将待插补变量的信息纳入主成分估计中。我们开展了大量模拟研究,评估采用不同版本监督式降维的MICE统计特性,并将其与使用经典无监督PCA作为简化降维技术的方法进行对比。