With the rapid advancement of Large Language Models (LLMs) and their outstanding performance in semantic and contextual comprehension, the potential of LLMs in specialized domains warrants exploration. This paper introduces the NoteAid EHR Interaction Pipeline, an innovative approach developed using generative LLMs to assist in patient education, a task stemming from the need to aid patients in understanding Electronic Health Records (EHRs). Building upon the NoteAid work, we designed two novel tasks from the patient's perspective: providing explanations for EHR content that patients may not understand and answering questions posed by patients after reading their EHRs. We extracted datasets containing 10,000 instances from MIMIC Discharge Summaries and 876 instances from the MADE medical notes collection, respectively, executing the two tasks through the NoteAid EHR Interaction Pipeline with these data. Performance data of LLMs on these tasks were collected and constructed as the corresponding NoteAid EHR Interaction Dataset. Through a comprehensive evaluation of the entire dataset using LLM assessment and a rigorous manual evaluation of 64 instances, we showcase the potential of LLMs in patient education. Besides, the results provide valuable data support for future exploration and applications in this domain while also supplying high-quality synthetic datasets for in-house system training.
翻译:随着大语言模型(LLMs)的快速发展及其在语义和上下文理解方面的卓越表现,LLMs在专业领域的潜力值得探索。本文介绍了NoteAid EHR交互流水线,这是一种利用生成式LLMs辅助患者教育的新颖方法,其任务源于帮助患者理解电子健康记录(EHRs)的需求。基于NoteAid工作,我们从患者角度设计了两项新任务:解释EHR中患者可能不理解的内容,以及回答患者在阅读EHR后提出的问题。我们从MIMIC出院小结中提取了包含10,000个实例的数据集,并从MADE医疗笔记集合中提取了876个实例,通过NoteAid EHR交互流水线分别执行这两项任务。我们收集了LLMs在这些任务上的性能数据,并构建了相应的NoteAid EHR交互数据集。通过对整个数据集进行LLM评估的全面评估,以及对64个实例进行严格的医学人工评估,我们展示了LLMs在患者教育中的潜力。此外,这些结果为该领域的未来探索和应用提供了宝贵的数据支持,同时也为内部系统训练提供了高质量的合成数据集。