Breast cancer is one of the two cancers responsible for the most deaths in women, with about 42,000 deaths each year in the US. That there are over 300,000 breast cancers newly diagnosed each year suggests that only a fraction of the cancers result in mortality. Thus, most of the women undergo seemingly curative treatment for localized cancers, but a significant later succumb to metastatic disease for which current treatments are only temporizing for the vast majority. The current prognostic metrics are of little actionable value for 4 of the 5 women seemingly cured after local treatment, and many women are exposed to morbid and even mortal adjuvant therapies unnecessarily, with these adjuvant therapies reducing metastatic recurrence by only a third. Thus, there is a need for better prognostics to target aggressive treatment at those who are likely to relapse and spare those who were actually cured. While there is a plethora of molecular and tumor-marker assays in use and under-development to detect recurrence early, these are time consuming, expensive and still often un-validated as to actionable prognostic utility. A different approach would use large data techniques to determine clinical and histopathological parameters that would provide accurate prognostics using existing data. Herein, we report on machine learning, together with grid search and Bayesian Networks to develop algorithms that present a AUC of up to 0.9 in ROC analyses, using only extant data. Such algorithms could be rapidly translated to clinical management as they do not require testing beyond routine tumor evaluations.
翻译:乳腺癌是导致女性死亡的两大癌症之一,在美国每年约造成42,000例死亡。每年新诊断的乳腺癌病例超过30万例,这表明仅部分癌症最终导致死亡。因此,大多数女性接受了针对局部癌症的看似根治性的治疗,但其中相当一部分患者后期仍死于转移性疾病——对绝大多数患者而言,当前的治疗手段仅能暂时控制病情。现有预后指标对五分之四经局部治疗后看似痊愈的女性缺乏实际指导价值,许多女性不必要地接受了具有致病甚至致死风险的辅助治疗,而这些辅助治疗仅能将转移复发率降低三分之一。因此,我们需要更精准的预后工具,以便对可能复发的患者进行强化治疗,同时避免对实际已治愈者过度治疗。尽管目前已有大量正在使用或开发中的分子与肿瘤标志物检测方法用于早期复发监测,但这些方法耗时昂贵,且其预后指导价值往往未经充分验证。另一种研究路径是利用大数据技术,通过现有临床与组织病理学参数构建精准预后模型。本研究报道了结合网格搜索与贝叶斯网络的机器学习方法,仅利用现有数据开发的算法在ROC分析中AUC最高可达0.9。此类算法可快速转化为临床管理工具,因其无需超出常规肿瘤评估的额外检测。