Metastatic prostate cancer is one of the most common cancers in men. In the advanced stages of prostate cancer, tumours can metastasise to other tissues in the body, which is fatal. In this thesis, we performed a genetic analysis of prostate cancer tumours at different metastatic sites using data science, machine learning and topological network analysis methods. We presented a general procedure for pre-processing gene expression datasets and pre-filtering significant genes by analytical methods. We then used machine learning models for further key gene filtering and secondary site tumour classification. Finally, we performed gene co-expression network analysis and community detection on samples from different prostate cancer secondary site types. In this work, 13 of the 14,379 genes were selected as the most metastatic prostate cancer related genes, achieving approximately 92% accuracy under cross-validation. In addition, we provide preliminary insights into the co-expression patterns of genes in gene co-expression networks.
翻译:转移性前列腺癌是男性最常见的癌症之一。在前列腺癌晚期阶段,肿瘤可能转移到身体其他组织,这通常是致命的。在本论文中,我们运用数据科学、机器学习和拓扑网络分析方法,对不同转移部位的前列腺癌肿瘤进行了遗传分析。我们提出了一套通用的基因表达数据集预处理流程,并通过分析方法对重要基因进行初步筛选。随后,我们利用机器学习模型进一步筛选关键基因并进行次级部位肿瘤分类。最后,我们对不同前列腺癌次级部位类型的样本进行了基因共表达网络分析与社群检测。在本研究中,从14,379个基因中筛选出13个与前列腺癌转移最为相关的基因,在交叉验证下准确率达到约92%。此外,我们还对基因共表达网络中的基因共表达模式提供了初步见解。