As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all obtained from the same patients. Such data lend themselves to being used as covariates in automatic outcome prediction because each omics type may contribute unique information, possibly improving predictions compared to using only one omics data type. Frequently, however, in the training data and the data to which automatic prediction rules should be applied, the test data, the different omics data types are not available for all patients. We refer to this type of data as block-wise missing multi-omics data. First, we provide a literature review on existing prediction methods applicable to such data. Subsequently, using a collection of 13 publicly available multi-omics data sets, we compare the predictive performances of several of these approaches for different block-wise missingness patterns. Finally, we discuss the results of this empirical comparison study and draw some tentative conclusions.
翻译:近年来,随着组学数据可用性的提升,多组学数据(即从同一患者获取的包含基因组、转录组或蛋白质组等多种类型的高维分子数据)不断生成。此类数据天然适合作为自动结果预测中的协变量,因为每种组学类型可能贡献独特信息,从而可能比仅使用单一组学数据类型获得更好的预测效果。然而,在训练数据和需应用自动预测规则的测试数据中,不同组学数据类型往往并非对所有患者均可用。我们将此类数据称为块状缺失多组学数据。首先,我们对适用于此类数据的现有预测方法进行文献综述。随后,利用13个公开可用的多组学数据集集合,针对不同的块状缺失模式比较多种方法的预测性能。最后,我们讨论这项实证比较研究的结果,并得出若干初步结论。