Clinical notes are an essential component of a health record. This paper evaluates how natural language processing (NLP) can be used to identify the risk of acute care use (ACU) in oncology patients, once chemotherapy starts. Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex. This paper explores the use of free-text notes for the prediction of ACU instead of SHD. Deep Learning models were compared to manually engineered language features. Results show that SHD models minimally outperform NLP models; an l1-penalised logistic regression with SHD achieved a C-statistic of 0.748 (95%-CI: 0.735, 0.762), while the same model with language features achieved 0.730 (95%-CI: 0.717, 0.745) and a transformer-based model achieved 0.702 (95%-CI: 0.688, 0.717). This paper shows how language models can be used in clinical applications and underlines how risk bias is different for diverse patient groups, even using only free-text data.
翻译:临床笔记是健康记录的重要组成部分。本文评估了自然语言处理(NLP)在肿瘤患者开始化疗后,如何用于识别急性护理使用(ACU)风险。基于结构化健康数据(SHD)的风险预测现已标准化,但利用自由文本格式进行预测则较为复杂。本文探索了使用自由文本笔记替代SHD进行ACU预测的方法。我们比较了深度学习模型与人工设计的语言特征。结果显示,SHD模型略优于NLP模型:采用L1惩罚逻辑回归的SHD模型C统计量为0.748(95%置信区间:0.735, 0.762),而相同模型使用语言特征时C统计量为0.730(95%置信区间:0.717, 0.745),基于Transformer的模型C统计量为0.702(95%置信区间:0.688, 0.717)。本文展示了语言模型在临床中的应用潜力,并强调即使仅使用自由文本数据,不同患者群体的风险偏差依然存在差异。