Predicting one-year clinical instability and mortality in heart failure patients using sequence modeling

Heart failure (HF) discharge planning depends on identifying patients at risk of deterioration or death, yet accurate prediction from routinely collected electronic health records (EHRs) remains challenging. We developed and validated sequence models for three one-year prediction tasks in a Swedish HF cohort (N = 42,820): clinical instability (a rehospitalization phenotype) and mortality after the initial in-hospital HF diagnosis, and mortality after the latest hospitalization. A modular three-component framework transforms structured EHRs into patient sequences by specifying tokenization strategies, temporal representations, and model configurations. Patient data included diagnoses, vital signs, laboratories, medications, and procedures. Autoregressive next-token prediction models consistently outperformed alternative objectives in short-context settings (<= 512 tokens). The best model (Llama) achieved AUPRCs (95% CI) of 0.555 (0.535-0.575), 0.582 (0.558-0.608), and 0.854 (0.842-0.865), with robust calibration. Ablations show Llama and Mamba variants learn efficient patient representations, with tiny configurations surpassing larger conventional baselines, indicating that model size alone does not improve performance. With limited clinical concepts or training data, Llama maintains strong performance, frequently surpassing full-data baselines. Combining clinical instability and mortality predictions defines four distinct care pathways, from standard primary care to intensive home care, supporting patient-centered decisions at discharge. These findings demonstrate accurate risk prediction from routine hospital data, provide actionable development guidance, and support post-discharge risk stratification.

翻译：心力衰竭（HF）出院规划取决于识别出存在病情恶化或死亡风险的患者，然而，利用常规收集的电子健康记录（EHRs）进行准确预测仍具挑战性。我们针对瑞典一个HF队列（N = 42,820）开发并验证了用于三个一年预测任务的序列模型：初次住院HF诊断后的临床不稳定性（一种再住院表型）与死亡率，以及最近一次住院后的死亡率。一个模块化的三组件框架通过指定分词策略、时间表示和模型配置，将结构化的EHRs转换为患者序列。患者数据包括诊断、生命体征、实验室检查、药物和手术。在短上下文设置（<= 512 tokens）中，自回归式的下一标记预测模型始终优于其他训练目标。最佳模型（Llama）的AUPRC（95%置信区间）分别达到了0.555（0.535-0.575）、0.582（0.558-0.608）和0.854（0.842-0.865），且具有稳健的校准度。消融实验表明，Llama和Mamba变体能学习高效的患者表示，其微型配置即能超越更大的传统基线模型，这表明模型规模本身并不能提升性能。在临床概念或训练数据有限的情况下，Llama仍能保持强劲性能，常常超越基于完整数据的基线模型。结合临床不稳定性和死亡率预测可定义四条不同的护理路径，从标准的初级护理到强化家庭护理，为出院时的以患者为中心的决策提供支持。这些发现证明了利用常规医院数据进行准确风险预测的可行性，提供了可操作的开发指导，并支持出院后的风险分层。