Metastatic Breast Cancer Prognostication Through Multimodal Integration of Dimensionality Reduction Algorithms and Classification Algorithms

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the point where the cancer has spread to other parts of the body and is the cause of approximately 90% of cancer related deaths. Normally, pathologists spend hours each day to manually classify whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of time and emphasizes the importance to be aware of human error, and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer saving thousands of lives and can also improve the speed and efficiency of the process thereby taking less resources and time. So far, deep learning methodology of AI has been used in the research to detect cancer. This study is a novel approach to determine the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm to reduce the dimensionality of the dataset, and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbors algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

翻译：机器学习是人工智能的一个分支，旨在通过分析数据来发现其中的模式。本研究聚焦于利用机器学习检测转移性癌症。转移性癌症是指癌细胞已扩散至身体其他部位的阶段，约占癌症相关死亡的90%。病理学家通常需要每天花费数小时手动分类肿瘤为良性或恶性。这一繁琐任务导致超过60%的转移灶被错误标注，突显了人类误差及其他效率低下的问题。机器学习有望改善转移性癌症的准确识别，从而挽救数千生命，同时提升流程速度与效率，减少资源与时间消耗。目前，人工智能的深度学习方法已被应用于癌症检测研究。本研究提出一种创新方法，探索结合预处理算法与分类算法检测转移性癌症的潜力。研究采用两种预处理算法：主成分分析（PCA）与遗传算法，以降低数据集维度；随后使用三种分类算法：逻辑回归、决策树分类器与K近邻算法，对病理扫描图像中的转移性癌症进行检测。由PCA、遗传算法与K近邻算法组成的机器学习流程取得了71.14%的最高准确率，表明预处理与分类算法在检测转移性癌症方面具有巨大潜力。