Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining. We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
翻译:高通量技术的进步催生了日益丰富的组学数据集。整合多种异质数据源已成为生物学和生物信息学领域的重要课题。多核学习(MKL)作为一种灵活有效的工具,能够充分考虑多组学输入数据的多样性特征,但在基因组数据挖掘领域尚未得到充分利用。本研究提出了基于不同核融合策略的新型MKL方法。为从输入核的元核中学习,我们将无监督整合算法适配至支持向量机的监督任务中。同时测试了用于核融合与分类的深度学习架构。结果表明,基于MKL的模型能够超越更复杂的先进监督式多组学整合方法。多核学习为多组学数据预测模型提供了天然框架,其快速可靠的特性足以与复杂架构竞争并实现性能超越。本研究为生物数据挖掘研究、生物标志物发现以及异质数据整合方法的进一步发展提供了方向。