Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.We provide novel MKL approaches based on different kernel fusion strategies.To learn from the meta-kernel of input kernels, we adaptedunsupervised integration algorithms for supervised tasks with support vector machines.We also tested deep learning architectures for kernel fusion and classification.The results show that MKL-based models can compete with more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics genomic data. Our results offer a direction for bio-data mining research and further development of methods for heterogeneous data integration.
翻译:高通量技术的进步使得组学数据集的可用性不断增加。多源异质性数据的整合目前是生物学和生物信息学面临的一个问题。尽管在基因组数据挖掘中未得到充分利用,但多核学习(MKL)已被证明是一种灵活且有效的方法,可考虑多组学输入数据的多样性质。我们基于不同的核融合策略,提供了新颖的MKL方法。为了从输入核的元核中学习,我们将无监督整合算法适配到支持向量机的监督任务中。我们还测试了用于核融合和分类的深度学习架构。结果表明,基于MKL的模型可以与更复杂、更先进的监督多组学整合方法竞争。多核学习为多组学基因组数据中的预测模型提供了一个自然框架。我们的结果为生物数据挖掘研究以及异质性数据整合方法的进一步发展提供了方向。