Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This leads to very large quantities of missing data which, especially when combined with high-dimensionality, makes the application of conditional imputation methods computationally infeasible. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.
翻译:对于由多个不同特征集(称为视角)描述同一组对象的数据,称为多视角数据。当多视角数据中出现缺失值时,某个视角中的所有特征很可能同时缺失。这会导致数据缺失量极大,尤其当数据具有高维特征时,使得基于条件分布的填补方法在计算上不可行。我们提出了一种新的填补方法,该方法基于现有的用于多视角学习的堆叠惩罚逻辑回归(StaPLR)算法。新方法在降维后的空间中进行填补,以解决多视角场景固有的计算挑战。我们在模拟数据集中将新填补方法的性能与几种现有填补算法进行了比较。结果表明,新填补方法能以更低的计算成本取得具有竞争力的结果,并且能够使诸如missForest和预测均值匹配等高级填补算法在原本计算不可行的场景中得到应用。