Abstract Cancer is a tumor that affects people worldwide, with a higher incidence in females but not excluding males. It ranks among the top five deadliest types of cancer, particularly prevalent in less developed countries with deficient healthcare programs. Finding the best algorithm for effective breast cancer prediction with minimal error is crucial. In this scientific article, we employed the SMOTE method in conjunction with the R package Shiny to enhance the algorithms and improve prediction accuracy. We classified the tumor types as benign and malignant (B/M). Various algorithms were analyzed using a Kaggle dataset, and our study identified the superior algorithm as logistic regression. We evaluated algorithm performance using confusion matrices to visualize results and the ROC Curve to obtain a comprehensive measure of performance. Additionally, we calculated precision by dividing the number of correct predictions by the total predictions Keywords Breast cancer, Smote, Benign, Malignant.
翻译:摘要 癌症是一种影响全球人群的肿瘤,在女性中发病率较高,但男性亦不例外。它位列最致命的五种癌症之一,在医疗项目欠发达的国家尤为常见。寻找最佳算法以实现高效、低误差的乳腺癌预测至关重要。本文采用SMOTE方法结合R包Shiny来增强算法性能并提高预测准确性。我们将肿瘤类型分为良性和恶性(B/M)。通过Kaggle数据集对多种算法进行分析,研究确定逻辑回归为最优算法。我们使用混淆矩阵可视化算法性能,并通过ROC曲线获取综合性能指标。此外,通过正确预测次数除以总预测次数计算精确度。关键词:乳腺癌,SMOTE,良性,恶性