Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel \textbf{Co}unter\textbf{F}actual \textbf{E}xplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking ``what if'' scenarios. By leveraging this concept, CoFE can learn non-spurious visual representations by contrasting the representations between factual and counterfactual images. Specifically, we derive counterfactual images by swapping a patch between positive and negative samples until a predicted diagnosis shift occurs. Here, positive and negative samples are the most semantically similar but have different diagnosis labels. Additionally, CoFE employs a learnable prompt to efficiently fine-tune the pre-trained large language model, encapsulating both factual and counterfactual content to provide a more generalizable prompt representation. Extensive experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports and outperform in terms of language generation and clinical efficacy metrics.
翻译:由于解剖结构的共性,放射学图像及其对应报告表现出高度相似性。这种固有的数据偏差可能导致自动报告生成模型学习到纠缠且虚假的表征,从而产生误诊报告。为解决这些问题,我们提出了一种新颖的基于**反事实解释**的框架(CoFE)用于放射学报告生成。反事实解释作为一种有效工具,通过提出“假设”情景来理解算法决策如何被改变。利用这一概念,CoFE能够通过对比事实图像与反事实图像的表征,学习非虚假的视觉表征。具体而言,我们通过在正负样本间交换图像块直至预测诊断结果发生变化来生成反事实图像。其中,正负样本指语义最相似但具有不同诊断标签的样本。此外,CoFE采用可学习的提示向量来高效微调预训练大语言模型,该提示同时封装事实与反事实内容,以提供更具泛化能力的提示表征。在两个基准数据集上的大量实验表明,利用反事实解释能使CoFE生成语义连贯且事实完整的报告,并在语言生成和临床效能指标上优于现有方法。