A medical provider's summary of a patient visit serves several critical purposes, including clinical decision-making, facilitating hand-offs between providers, and as a reference for the patient. An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue, despite the complexity of patient-generated language. Even minor inaccuracies in visit summaries (for example, summarizing "patient does not have a fever" when a fever is present) can be detrimental to the outcome of care for the patient. This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks that are sequentially built upon. First, we identify medical entities and their affirmations within the conversation to serve as building blocks. We study dynamically constructing few-shot prompts for tasks by conditioning on relevant patient information and use GPT-3 as the backbone for our experiments. We also develop GPT-derived summarization metrics to measure performance against reference summaries quantitatively. Both our human evaluation study and metrics for medical correctness show that summaries generated using this approach are clinically accurate and outperform the baseline approach of summarizing the dialog in a zero-shot, single-prompt setting.
翻译:医疗提供者对患者就诊的摘要在多个关键环节发挥作用,包括临床决策、促进医疗团队交接以及作为患者的参考资料。尽管患者语言具有复杂性,但有效的摘要需具备连贯性,并准确涵盖对话中所有医学相关信息。就诊摘要中的微小错误(例如,在患者实际发烧时记录为“患者未发热”)可能对患者的治疗结果造成不利影响。本文通过将医疗对话摘要任务分解为多个依次构建的对话理解子任务,来解决这一挑战。首先,我们在对话中识别医学实体及其肯定/否定状态,将其作为基础模块。我们研究如何通过关联患者相关信息,动态构建任务的少样本提示,并以GPT-3作为实验的核心模型。同时,我们开发了基于GPT的摘要评估指标,以定量衡量生成摘要与参考摘要的匹配度。人工评估研究以及医学正确性指标均表明,采用本方法生成的摘要具有临床准确性,并优于在零样本单提示设置下直接生成对话摘要的基线方法。