Radiology Report Generation (RRG) draws attention as an interaction between vision and language fields. Previous works inherited the ideology of vision-to-language generation tasks,aiming to generate paragraphs with high consistency as reports. However, one unique characteristic of RRG, the independence between diseases, was neglected, leading to the injection of the spurious confounder, i.e., the disease co-occurrence. Unfortunately, this confounder confuses the process of report generation worse because of the biased RRG data distribution. In this paper, to rethink this issue thoroughly, we reason about its causes and effects from a novel perspective of statistics and causality, where the Joint Vision Coupling and the Conditional Sentence Coherence Coupling are two aspects prone to implicitly decrease the accuracy of reports. Then, a counterfactual augmentation strategy that contains the Counterfactual Sample Synthesis and the Counterfactual Report Reconstruction sub-methods is proposed to break these two aspects of spurious effects. Experimental results and further analyses on two widely used datasets justify our reasoning and proposed methods.
翻译:放射学报告生成(RRG)作为视觉与语言领域的交叉任务备受关注。以往研究沿用了视觉到语言生成任务的范式,旨在生成与报告高度一致的段落。然而,RRG的一个独特特性——疾病之间的独立性——被忽视,导致虚假混杂因素(即疾病共现)的引入。遗憾的是,由于RRG数据分布存在偏差,这一混杂因素进一步加剧了对报告生成过程的干扰。本文从统计学与因果性的全新视角深入剖析该问题,指出联合视觉耦合与条件句连贯耦合是两种易隐性降低报告准确性的因素。随后提出包含反事实样本合成与反事实报告重构子方法的反事实增强策略,以打破这两方面的虚假效应。在两个广泛使用的数据集上的实验结果与进一步分析验证了我们的推理与所提方法。