Despite the fact that cancer survivability rates vary greatly between stages, traditional survival prediction models have frequently been trained and assessed using examples from all combined phases of the disease. This method may result in an overestimation of performance and ignore the stage-specific variations. Using the SEER dataset, we created and verified explainable machine learning (ML) models to predict stage-specific cancer survivability in colorectal, stomach, and liver cancers. ML-based cancer survival analysis has been a long-standing topic in the literature; however, studies involving the explainability and transparency of ML survivability models are limited. Our use of explainability techniques, including SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), enabled us to illustrate significant feature-cancer stage interactions that would have remained hidden in traditional black-box models. We identified how certain demographic and clinical variables influenced survival differently across cancer stages and types. These insights provide not only transparency but also clinical relevance, supporting personalized treatment planning. By focusing on stage-specific models, this study provides new insights into the most important factors at each stage of cancer, offering transparency and potential clinical relevance to support personalized treatment planning.
翻译:尽管癌症生存率在不同阶段差异显著,但传统的生存预测模型通常使用所有疾病阶段合并的样本进行训练和评估。这种方法可能导致性能的高估,并忽视阶段特异性差异。利用SEER数据集,我们构建并验证了可解释机器学习模型,用于预测结直肠癌、胃癌和肝癌的阶段特异性生存率。基于机器学习的癌症生存分析在文献中是一个长期存在的主题;然而,涉及机器学习生存模型可解释性与透明度的研究仍然有限。我们采用包括SHapley Additive exPlanations和Local Interpretable Model-agnostic Explanations在内的可解释性技术,揭示了在传统黑盒模型中可能被掩盖的重要特征-癌症阶段交互作用。我们识别了某些人口统计学和临床变量如何在不同癌症阶段和类型中对生存产生差异化影响。这些见解不仅提供了透明度,还具有临床相关性,有助于支持个性化治疗规划。通过聚焦于阶段特异性模型,本研究为癌症各阶段最重要的影响因素提供了新的见解,在提供透明度的同时展现出潜在的临床价值,以支持个性化治疗策略的制定。