面向AI临床摘要的患者中心化摘要框架：一种混合方法设计 (Patient-Centered Summarization Framework for AI Clinical Summarization: A Mixed-Methods Design)

Maria Lizarazo Jimenez,Ana Gabriela Claros,Kieran Green,David Toro-Tobon,Felipe Larios,Sheena Asthana,Camila Wenczenovicz,Kerly Guevara Maldonado,Luis Vilatuna-Andrango,Cristina Proano-Velez,Satya Sai Sri Bandi,Shubhangi Bagewadi,Megan E. Branda,Misk Al Zahidy,Saturnino Luz,Mirella Lapata,Juan P. Brito,Oscar J. Ponce-Ponte

from arxiv, The first two listed authors contributed equally Pages: 21; Figures:2; Tables:3

Large Language Models (LLMs) are increasingly demonstrating the potential to reach human-level performance in generating clinical summaries from patient-clinician conversations. However, these summaries often focus on patients' biology rather than their preferences, values, wishes, and concerns. To achieve patient-centered care, we propose a new standard for Artificial Intelligence (AI) clinical summarization tasks: Patient-Centered Summaries (PCS). Our objective was to develop a framework to generate PCS that capture patient values and ensure clinical utility and to assess whether current open-source LLMs can achieve human-level performance in this task. We used a mixed-methods process. Two Patient and Public Involvement groups (10 patients and 8 clinicians) in the United Kingdom participated in semi-structured interviews exploring what personal and contextual information should be included in clinical summaries and how it should be structured for clinical use. Findings informed annotation guidelines used by eight clinicians to create gold-standard PCS from 88 atrial fibrillation consultations. Sixteen consultations were used to refine a prompt aligned with the guidelines. Five open-source LLMs (Llama-3.2-3B, Llama-3.1-8B, Mistral-8B, Gemma-3-4B, and Qwen3-8B) generated summaries for 72 consultations using zero-shot and few-shot prompting, evaluated with ROUGE-L, BERTScore, and qualitative metrics. Patients emphasized lifestyle routines, social support, recent stressors, and care values. Clinicians sought concise functional, psychosocial, and emotional context. The best zero-shot performance was achieved by Mistral-8B (ROUGE-L 0.189) and Llama-3.1-8B (BERTScore 0.673); the best few-shot by Llama-3.1-8B (ROUGE-L 0.206, BERTScore 0.683). Completeness and fluency were similar between experts and models, while correctness and patient-centeredness favored human PCS.

翻译：大型语言模型（LLMs）在从患者-临床医生对话中生成临床摘要方面，正日益展现出达到人类水平性能的潜力。然而，这些摘要往往侧重于患者的生物学信息，而非其偏好、价值观、愿望与关切。为实现患者中心化照护，我们为人工智能（AI）临床摘要任务提出了一项新标准：患者中心化摘要（PCS）。我们的目标是开发一个生成PCS的框架，以捕捉患者价值观并确保临床实用性，同时评估当前开源LLMs能否在此任务中达到人类水平性能。我们采用了混合方法流程。英国的两个患者与公众参与小组（10名患者和8名临床医生）参与了半结构化访谈，探讨临床摘要应包含哪些个人与情境信息，以及如何为临床使用进行结构化组织。研究结果形成了标注指南，由八名临床医生依据该指南，从88份心房颤动咨询记录中创建了黄金标准PCS。其中16份咨询记录用于优化与指南对齐的提示模板。五个开源LLM（Llama-3.2-3B、Llama-3.1-8B、Mistral-8B、Gemma-3-4B和Qwen3-8B）通过零样本和少样本提示，为72份咨询记录生成了摘要，并使用ROUGE-L、BERTScore和定性指标进行评估。患者强调生活方式习惯、社会支持、近期压力源及照护价值观。临床医生则寻求简洁的功能性、心理社会及情感背景信息。零样本提示中，Mistral-8B（ROUGE-L 0.189）和Llama-3.1-8B（BERTScore 0.673）表现最佳；少样本提示中，Llama-3.1-8B（ROUGE-L 0.206，BERTScore 0.683）表现最优。在完整性与流畅性方面，专家与模型表现相近，而在正确性与患者中心化程度上，人类生成的PCS更具优势。

相关内容

关注 7103

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日