Electronic Health Records (EHRs) provide crucial information for clinical decision-making. However, their high-dimensionality, heterogeneity, and sparsity make clinical prediction challenging. Large Language Models (LLMs) allowed progress towards addressing this challenge by leveraging parametric medical knowledge to enhance EHR data for clinical prediction tasks. Despite the significant achievements made so far, most of the existing approaches are fundamentally task-agnostic in the sense that they deploy LLMs as EHR encoders or EHR completion modules without fully integrating signals from the prediction tasks. This naturally hinders task performance accuracy. In this work, we propose Rewrite-To-Predict (ReToP), an LLM-based framework that addresses this limitation through an end-to-end training of an EHR rewriter and a clinical predictor. To cope with the lack of EHR rewrite training data, we generate synthetic pseudo-labels using clinical-driven feature selection strategies to create diverse patient rewrites for fine-tuning the EHR rewriter. ReToP aligns the rewriter with prediction objectives using a novel Classifier Supervised Contribution (CSC) score that enables the EHR rewriter to generate clinically relevant rewrites that directly enhance prediction. Our ReToP framework surpasses strong baseline models across three clinical tasks on MIMIC-IV. Moreover, the analysis of ReToP shows its generalizability to unseen datasets and tasks with minimal fine-tuning while preserving faithful rewrites and emphasizing task-relevant predictive features.
翻译:电子健康记录(EHR)为临床决策提供了关键信息。然而,其高维性、异质性和稀疏性使得临床预测具有挑战性。大型语言模型(LLMs)通过利用参数化医学知识来增强EHR数据以支持临床预测任务,为解决这一挑战带来了进展。尽管目前已取得显著成就,但现有方法大多本质上是任务无关的,即它们将LLMs部署为EHR编码器或EHR补全模块,而未充分整合来自预测任务的信号。这自然限制了任务性能的准确性。在本工作中,我们提出了重写以预测(ReToP),这是一个基于LLM的框架,通过端到端训练EHR重写器和临床预测器来解决这一局限。为应对缺乏EHR重写训练数据的问题,我们采用临床驱动的特征选择策略生成合成伪标签,以创建多样化的患者重写用于微调EHR重写器。ReToP使用一种新颖的分类器监督贡献(CSC)分数将重写器与预测目标对齐,使EHR重写器能够生成直接增强预测的临床相关重写。我们的ReToP框架在MIMIC-IV的三个临床任务上超越了强基线模型。此外,对ReToP的分析表明,其能够以最小微调泛化到未见数据集和任务,同时保持忠实重写并强调任务相关的预测特征。