In this chapter we illustrate the use of some Machine Learning techniques in the context of omics data. More precisely, we review and evaluate the use of Random Forest and Penalized Multinomial Logistic Regression for integrative analysis of genomics and immunomics in pancreatic cancer. Furthermore, we propose the use of association rules with predictive purposes to overcome the low predictive power of the previously mentioned models. Finally, we apply the reviewed methods to a real data set from TCGA made of 107 tumoral pancreatic samples and 117,486 germline SNPs, showing the good performance of the proposed methods to predict the immunological infiltration in pancreatic cancer.
翻译:在本章中,我们阐述了机器学习技术在组学数据环境下的应用。具体而言,我们回顾并评估了随机森林与惩罚多项逻辑回归在胰腺癌基因组学与免疫组学整合分析中的应用。此外,我们提出利用具有预测功能的关联规则,以克服前述模型预测能力不足的问题。最后,我们将所讨论的方法应用于TCGA的真实数据集(该数据集包含107例胰腺肿瘤样本及117,486个种系SNP),结果表明所提方法在预测胰腺癌免疫浸润方面具有良好性能。