Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting their transferability to domains and languages where those resources are unavailable. Second, while automatic metrics like ROUGE show progress, we lack a good understanding of the errors and failure modes in this task. To bridge these gaps, we first propose a domain-agnostic guidance signal in form of variable-length extractive summaries. Our empirical results on two English benchmarks demonstrate that this guidance signal improves upon unguided summarization while being competitive with domain-specific methods. Additionally, we run an expert evaluation of four systems according to a taxonomy of 11 fine-grained errors. We find that the most pressing differences between automatic summaries and those of radiologists relate to content selection including omissions (up to 52%) and additions (up to 57%). We hypothesize that latent reporting factors and corpus-level inconsistencies may limit models to reliably learn content selection from the available data, presenting promising directions for future work.
翻译:自动将放射学报告精炼为简明印象可减轻临床医生的人工负担并提升报告一致性。以往研究通过引导式抽象摘要生成来增强内容选择与事实准确性,但存在两个关键问题:第一,现有方法严重依赖领域特有资源提取引导信号,限制了其在缺乏此类资源的领域和语言中的可迁移性;第二,尽管ROUGE等自动评测指标显示进展,但我们对这类任务的错误模式和失效机理仍缺乏深入理解。为弥合这些差距,我们首先提出一种以变长抽取式摘要为形式的领域无关引导信号。基于两项英文基准数据集的实证结果表明,该引导信号在无引导摘要生成基础上表现更优,且与领域特异方法具有竞争力。此外,我们根据包含11种细粒度错误的分类体系对四个系统进行专家评估,发现自动摘要与放射科医师摘要之间最显著的差异集中于内容选择层面,包括遗漏(高达52%)和添加(高达57%)。我们推测潜在的报告撰写要素与语料库层级的不一致性可能限制了模型从现有数据中可靠学习内容选择的能力,为未来研究提供了有希望的方向。