Extrinsically-Focused Evaluation of Omissions in Medical Summarization

The goal of automated summarization techniques (Paice, 1990; Kupiec et al, 1995) is to condense text by focusing on the most critical information. Generative large language models (LLMs) have shown to be robust summarizers, yet traditional metrics struggle to capture resulting performance (Goyal et al, 2022) in more powerful LLMs. In safety-critical domains such as medicine, more rigorous evaluation is required, especially given the potential for LLMs to omit important information in the resulting summary. We propose MED-OMIT, a new omission benchmark for medical summarization. Given a doctor-patient conversation and a generated summary, MED-OMIT categorizes the chat into a set of facts and identifies which are omitted from the summary. We further propose to determine fact importance by simulating the impact of each fact on a downstream clinical task: differential diagnosis (DDx) generation. MED-OMIT leverages LLM prompt-based approaches which categorize the importance of facts and cluster them as supporting or negating evidence to the diagnosis. We evaluate MED-OMIT on a publicly-released dataset of patient-doctor conversations and find that MED-OMIT captures omissions better than alternative metrics.

翻译：自动摘要技术（Paice, 1990; Kupiec 等, 1995）的目标是通过聚焦最关键信息来压缩文本。生成式大语言模型（LLM）已被证明是稳健的摘要工具，但传统指标难以捕捉更强LLM的生成性能（Goyal 等, 2022）。在医疗等安全关键领域，需要更严格的评估，尤其是考虑到LLM可能在生成的摘要中遗漏重要信息。我们提出MED-OMIT，一种新的医疗摘要省略基准。给定医患对话和生成的摘要，MED-OMIT将对话分类为一组事实，并识别摘要中遗漏的事实。我们进一步提出通过模拟每个事实对下游临床任务（即鉴别诊断生成）的影响来确定事实重要性。MED-OMIT利用基于LLM提示的方法，对事实重要性进行分类，并将其聚类为支持或否定诊断的证据。我们在公开的患者-医生对话数据集上评估MED-OMIT，发现其比替代指标能更好地捕捉省略情况。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日