Monitoring the health status of patients in the Intensive Care Unit (ICU) is a critical aspect of providing superior care and treatment. The availability of large-scale electronic health records (EHR) provides machine learning models with an abundance of clinical text and vital sign data, enabling them to make highly accurate predictions. Despite the emergence of advanced Natural Language Processing (NLP) algorithms for clinical note analysis, the complex textual structure and noise present in raw clinical data have posed significant challenges. Coarse embedding approaches without domain-specific refinement have limited the accuracy of these algorithms. To address this issue, we propose FINEEHR, a system that utilizes two representation learning techniques, namely metric learning and fine-tuning, to refine clinical note embeddings, while leveraging the intrinsic correlations among different health statuses and note categories. We evaluate the performance of FINEEHR using two metrics, namely Area Under the Curve (AUC) and AUC-PR, on a real-world MIMIC III dataset. Our experimental results demonstrate that both refinement approaches improve prediction accuracy, and their combination yields the best results. Moreover, our proposed method outperforms prior works, with an AUC improvement of over 10%, achieving an average AUC of 96.04% and an average AUC-PR of 96.48% across various classifiers.
翻译:监测重症监护病房(ICU)患者的健康状况是提供优质护理和治疗的关键环节。大规模电子健康记录(EHR)的可用性为机器学习模型提供了丰富的临床文本和生命体征数据,使其能够进行高精度预测。尽管针对临床笔记分析的高级自然语言处理(NLP)算法不断涌现,但原始临床数据中复杂的文本结构和噪声仍带来了显著挑战。缺乏领域特化精炼的粗粒度嵌入方法限制了这些算法的准确性。为解决此问题,我们提出了FINEEHR系统,该系统利用两种表示学习技术——度量学习和微调——来精炼临床笔记嵌入,同时挖掘不同健康状况和笔记类别间的内在关联。我们在真实世界的MIMIC III数据集上使用两个指标——曲线下面积(AUC)和AUC-PR——评估了FINEEHR的性能。实验结果表明,两种精炼方法均能提升预测精度,且其组合实现了最优效果。此外,我们提出的方法超越了先前工作,AUC提升超过10%,在不同分类器上平均AUC达到96.04%,平均AUC-PR达到96.48%。