Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.
翻译:单调缺失数据是数据分析中的常见问题。然而,插补与降维相结合的方法计算成本较高,尤其在数据集规模不断增大的情况下。为解决此问题,我们提出一种分块主成分分析插补(BPI)框架,用于对单调缺失数据进行降维与插补。该框架对数据各单调分块的可观测部分进行主成分分析(PCA),然后通过所选插补技术合并所获得的主成分完成插补。BPI可与多种插补技术协同工作,相比先插补后降维的方法,能显著缩短插补时间。因此,对于存在单调缺失数据的大规模数据集,这是一种实用且高效的方案。实验验证了其速度提升效果。此外,实验还表明,直接对缺失数据应用MICE插补可能无法收敛,而采用BPI与MICE组合处理数据则可实现收敛。