Cancer treatments are known to introduce cardiotoxicity, negatively impacting outcomes and survivorship. Identifying cancer patients at risk of heart failure (HF) is critical to improving cancer treatment outcomes and safety. This study examined machine learning (ML) models to identify cancer patients at risk of HF using electronic health records (EHRs), including traditional ML, Time-Aware long short-term memory (T-LSTM), and large language models (LLMs) using novel narrative features derived from the structured medical codes. We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers, among which 1,602 individuals developed HF after cancer. The LLM, GatorTron-3.9B, achieved the best F1 scores, outperforming the traditional support vector machines by 39%, the T-LSTM deep learning model by 7%, and a widely used transformer model, BERT, by 5.6%. The analysis shows that the proposed narrative features remarkably increased feature density and improved performance.
翻译:癌症治疗已知会引发心脏毒性,对患者预后和生存质量产生负面影响。识别存在心力衰竭风险的癌症患者对于改善癌症治疗结局和安全性至关重要。本研究采用机器学习模型,利用电子健康记录中结构化医疗代码衍生出的创新性叙事特征,通过传统机器学习、时间感知长短期记忆模型和大语言模型,识别存在心力衰竭风险的癌症患者。我们从佛罗里达大学健康系统筛选了12,806例肺癌、乳腺癌和结直肠癌患者队列,其中1,602例在确诊癌症后发生心力衰竭。大语言模型GatorTron-3.9B取得了最佳F1分数,较传统支持向量机提升39%,较T-LSTM深度学习模型提升7%,较广泛使用的Transformer模型BERT提升5.6%。分析表明,所提出的叙事特征显著提升了特征密度并改善了模型性能。