World models offer a principled framework for simulating future states under interventions, but realizing such models in complex, high-stakes domains like medicine remains challenging. Recent large language models (LLMs) have achieved strong performance on static medical reasoning tasks, raising the question of whether they can function as dynamic medical world models capable of simulating disease progression and treatment outcomes over time. In this work, we show that LLMs only incorporating medical knowledge struggle to maintain consistent patient states under sequential interventions, leading to error accumulation in long-horizon clinical simulation. To address this limitation, we introduce EHRWorld, a patient-centric medical world model trained under a causal sequential paradigm, together with EHRWorld-110K, a large-scale longitudinal clinical dataset derived from real-world electronic health records. Extensive evaluations demonstrate that EHRWorld significantly outperforms naive LLM-based baselines, achieving more stable long-horizon simulation, improved modeling of clinically sensitive events, and favorable reasoning efficiency, highlighting the necessity of training on causally grounded, temporally evolving clinical data for reliable and robust medical world modeling.
翻译:世界模型为在干预下模拟未来状态提供了一个原则性框架,但在医学这类复杂且高风险的领域中实现此类模型仍然具有挑战性。近期的大型语言模型(LLMs)在静态医学推理任务上取得了强劲的性能,这引发了一个问题:它们是否能够作为动态的医学世界模型,能够随时间模拟疾病进展和治疗结果。在这项工作中,我们表明,仅融合医学知识的LLMs难以在序列干预下维持一致的患者状态,导致长程临床模拟中的误差累积。为了解决这一局限性,我们引入了EHRWorld,一个在因果序列范式下训练的、以患者为中心的医学世界模型,以及EHRWorld-110K,一个源自真实世界电子健康记录的大规模纵向临床数据集。广泛的评估表明,EHRWorld显著优于基于朴素LLM的基线,实现了更稳定的长程模拟、对临床敏感事件的改进建模以及更优的推理效率,这凸显了在因果基础、时间演化的临床数据上进行训练对于可靠且稳健的医学世界建模的必要性。