Simulated patient systems play a crucial role in modern medical education and research, providing safe, integrative learning environments and enabling clinical decision-making simulations. Large Language Models (LLM) could advance simulated patient systems by replicating medical conditions and patient-doctor interactions with high fidelity and low cost. However, ensuring the effectiveness and trustworthiness of these systems remains a challenge, as they require a large, diverse, and precise patient knowledgebase, along with a robust and stable knowledge diffusion to users. Here, we developed AIPatient, an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and the Reasoning Retrieval-Augmented Generation (Reasoning RAG) agentic workflow as the generation backbone. AIPatient KG samples data from Electronic Health Records (EHRs) in the Medical Information Mart for Intensive Care (MIMIC)-III database, producing a clinically diverse and relevant cohort of 1,495 patients with high knowledgebase validity (F1 0.89). Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization. This agentic framework reaches an overall accuracy of 94.15% in EHR-based medical Question Answering (QA), outperforming benchmarks that use either no agent or only partial agent integration. Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p<0.1), and stability (ANOVA F-value 0.782, p<0.1). The promising performance of the AIPatient system highlights its potential to support a wide range of applications, including medical education, model evaluation, and system integration.
翻译:模拟患者系统在现代医学教育与研究中发挥着至关重要的作用,其提供了安全、综合的学习环境,并支持临床决策模拟。大型语言模型(LLM)能够以高保真度和低成本复现医疗状况与医患互动,从而推动模拟患者系统的发展。然而,确保此类系统的有效性与可信度仍面临挑战,因为它们需要庞大、多样且精确的患者知识库,以及稳健、稳定的知识向用户扩散的机制。为此,我们开发了AIPatient,这是一个先进的模拟患者系统,以AIPatient知识图谱(AIPatient KG)作为输入,并以推理检索增强生成(Reasoning RAG)智能体工作流作为生成核心。AIPatient KG从重症监护医学信息集市(MIMIC)-III数据库的电子健康记录(EHRs)中采样数据,构建了一个包含1,495名患者、临床多样且相关的队列,其知识库有效性较高(F1分数0.89)。Reasoning RAG利用了六个由LLM驱动的智能体,涵盖检索、知识图谱查询生成、抽象、检查、重写和总结等任务。该智能体框架在基于EHR的医学问答(QA)任务中达到了94.15%的整体准确率,优于未使用智能体或仅部分集成智能体的基准方法。我们的系统还表现出高可读性(Flesch阅读易度中位数77.23;Flesch-Kincaid年级水平中位数5.6)、强健性(ANOVA F值0.6126,p<0.1)和稳定性(ANOVA F值0.782,p<0.1)。AIPatient系统的优异性能凸显了其在支持医学教育、模型评估和系统集成等广泛应用领域的潜力。