Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.
翻译:当一组对象由多个不同的特征集(称为视角)描述时,这类数据被称为多视角数据。当多视角数据中出现缺失值时,同一视角内的所有特征很可能同时缺失。这可能导致缺失数据量极大,尤其当与高维度特征结合时,会使条件插补方法的应用在计算上不可行。然而,多视角结构可被用于降低插补的复杂性和计算负荷。本文基于现有的多视角学习算法——堆叠惩罚逻辑回归(StaPLR),提出一种新的插补方法。该方法在降维空间中进行插补,以应对多视角场景固有的计算挑战。我们在模拟数据集和实际数据应用中,将新插补方法与多种现有插补算法进行了性能比较。结果表明,新插补方法能以低得多的计算成本获得具有竞争力的结果,并使missForest和预测均值匹配等高级插补算法在原本计算不可行的场景中得以应用。