Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.
翻译:单调缺失数据是数据分析中的常见问题。然而,将插补与降维相结合可能导致计算成本高昂,尤其在数据集规模不断增长的背景下。为解决此问题,我们提出了一种分块主成分分析插补(BPI)框架,用于单调缺失数据的降维与插补。该框架对每个单调数据块中的观测部分进行主成分分析(PCA),再通过所选插补技术合并所获得的主成分以实现插补。BPI可兼容多种插补技术,相较于先插补后降维的方法,能显著缩短插补时间。这使得它成为处理含单调缺失数据的大规模数据集的一种实用且高效的方法。实验验证了其速度提升效果。此外,实验还表明,直接对缺失数据应用MICE插补可能无法收敛,而将BPI与MICE结合应用于数据则可能促进收敛。