Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This leads to very large quantities of missing data which, especially when combined with high-dimensionality, makes the application of conditional imputation methods computationally infeasible. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.
翻译:对于同一组对象由多个不同特征集(称为视角)所描述的数据,被称为多视角数据。当多视角数据中出现缺失值时,某个视角中的所有特征很可能同时缺失。这会导致大量数据缺失,尤其是与高维特征结合时,使得条件插补方法的计算变得不可行。我们基于已有的多视角学习算法——堆叠惩罚逻辑回归(StaPLR),提出了一种新的插补方法。该方法在降维空间中执行插补,以应对多视角情境下固有的计算挑战。我们通过模拟数据集将该新插补方法与几种现有插补算法进行性能比较。结果表明,新插补方法能以极低的计算成本获得具有竞争力的结果,并且使得诸如missForest和预测均值匹配等先进插补算法能够在原本计算不可行的场景中得到应用。