Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study

Niklas Mannhardt,Elizabeth Bondi-Kelly,Barbara Lam,Chloe O'Connell,Mercy Asiedu,Hussein Mozannar,Monica Agrawal,Alejandro Buendia,Tatiana Urman,Irbaz B. Riaz,Catherine E. Ricciardi,Marzyeh Ghassemi,David Sontag

Patients derive numerous benefits from reading their clinical notes, including an increased sense of control over their health and improved understanding of their care plan. However, complex medical concepts and jargon within clinical notes hinder patient comprehension and may lead to anxiety. We developed a patient-facing tool to make clinical notes more readable, leveraging large language models (LLMs) to simplify, extract information from, and add context to notes. We prompt engineered GPT-4 to perform these augmentation tasks on real clinical notes donated by breast cancer survivors and synthetic notes generated by a clinician, a total of 12 notes with 3868 words. In June 2023, 200 female-identifying US-based participants were randomly assigned three clinical notes with varying levels of augmentations using our tool. Participants answered questions about each note, evaluating their understanding of follow-up actions and self-reported confidence. We found that augmentations were associated with a significant increase in action understanding score (0.63 $\pm$ 0.04 for select augmentations, compared to 0.54 $\pm$ 0.02 for the control) with p=0.002. In-depth interviews of self-identifying breast cancer patients (N=7) were also conducted via video conferencing. Augmentations, especially definitions, elicited positive responses among the seven participants, with some concerns about relying on LLMs. Augmentations were evaluated for errors by clinicians, and we found misleading errors occur, with errors more common in real donated notes than synthetic notes, illustrating the importance of carefully written clinical notes. Augmentations improve some but not all readability metrics. This work demonstrates the potential of LLMs to improve patients' experience with clinical notes at a lower burden to clinicians. However, having a human in the loop is important to correct potential model errors.

翻译：患者通过阅读临床笔记获益良多，包括增强对自身健康状况的掌控感以及更好地理解护理计划。然而，临床笔记中复杂的医学术语和行业行话会阻碍患者理解，甚至可能引发焦虑。我们开发了一款面向患者的工具，利用大型语言模型简化临床笔记、提取信息并补充背景知识，以提升其可读性。我们通过提示工程优化GPT-4，使其对乳腺癌幸存者捐赠的真实临床笔记和临床医生生成的合成笔记（共12篇笔记，3868个单词）执行这些增强任务。2023年6月，200名美国女性参与者被随机分配三份经过不同级别增强的临床笔记（使用我们的工具）。参与者需回答每份笔记的相关问题，评估其对后续行动的理解程度及自我报告信心。研究发现，增强处理显著提升了行动理解得分（特定增强组为0.63±0.04，对照组为0.54±0.02，p=0.002）。我们还通过视频会议对7名自我认同的乳腺癌患者进行了深度访谈。增强功能（尤其是术语定义）在七名参与者中引发积极反馈，但部分人对依赖大型语言模型表示担忧。临床医生对增强内容进行了错误评估，发现存在误导性错误，且真实捐赠笔记中的错误率高于合成笔记，这凸显了临床笔记需谨慎撰写的重要性。增强处理改善了部分可读性指标，但并非全部。本研究表明，大型语言模型有潜力在减轻临床医生负担的同时改善患者对临床笔记的体验，但保留人工审核环节对于纠正模型潜在错误至关重要。