Lung cancer is the primary cause of cancer death globally, with non-small cell lung cancer (NSCLC) emerging as its most prevalent subtype. Among NSCLC patients, approximately 32.3% have mutations in the epidermal growth factor receptor (EGFR) gene. Osimertinib, a third-generation EGFR-tyrosine kinase inhibitor (TKI), has demonstrated remarkable efficacy in the treatment of NSCLC patients with activating and T790M resistance EGFR mutations. Despite its established efficacy, drug resistance poses a significant challenge for patients to fully benefit from osimertinib. The absence of a standard tool to accurately predict TKI resistance, including that of osimertinib, remains a critical obstacle. To bridge this gap, in this study, we developed an interpretable multimodal machine learning model designed to predict patient resistance to osimertinib among late-stage NSCLC patients with activating EGFR mutations, achieving a c-index of 0.82 on a multi-institutional dataset. This machine learning model harnesses readily available data routinely collected during patient visits and medical assessments to facilitate precision lung cancer management and informed treatment decisions. By integrating various data types such as histology images, next generation sequencing (NGS) data, demographics data, and clinical records, our multimodal model can generate well-informed recommendations. Our experiment results also demonstrated the superior performance of the multimodal model over single modality models (c-index 0.82 compared with 0.75 and 0.77), thus underscoring the benefit of combining multiple modalities in patient outcome prediction.
翻译:肺癌是全球癌症死亡的首要原因,其中非小细胞肺癌(NSCLC)已成为其最常见的亚型。在非小细胞肺癌患者中,约32.3%存在表皮生长因子受体(EGFR)基因突变。奥希替尼作为第三代EGFR-酪氨酸激酶抑制剂(TKI),在治疗具有激活性和T790M耐药性EGFR突变的非小细胞肺癌患者中展现出显著疗效。尽管其疗效已得到证实,但耐药性仍是患者无法从奥希替尼治疗中充分获益的重大挑战。目前缺乏能够准确预测TKI(包括奥希替尼)耐药性的标准化工具,这仍然是一个关键障碍。为填补这一空白,本研究开发了一种可解释的多模态机器学习模型,旨在预测具有激活型EGFR突变的晚期非小细胞肺癌患者对奥希替尼的耐药性,在多机构数据集上取得了c指数0.82的预测性能。该机器学习模型利用患者在就诊和医疗评估期间常规收集的易得数据,以促进精准肺癌管理和知情治疗决策。通过整合组织学图像、新一代测序(NGS)数据、人口统计学数据和临床记录等多种数据类型,我们的多模态模型能够生成基于充分信息的治疗建议。实验结果还表明,多模态模型的性能优于单模态模型(c指数0.82对比0.75和0.77),从而强调了在患者结局预测中融合多模态数据的优势。