Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives

Automatic radiology report generation is challenging as medical images or reports are usually similar to each other due to the common content of anatomy. This makes a model hard to capture the uniqueness of individual images and is prone to producing undesired generic or mismatched reports. This situation calls for learning more discriminative features that could capture even fine-grained mismatches between images and reports. To achieve this, this paper proposes a novel framework to learn discriminative image and report features by distinguishing them from their closest peers, i.e., hard negatives. Especially, to attain more discriminative features, we gradually raise the difficulty of such a learning task by creating increasingly hard negative reports for each image in the feature space during training, respectively. By treating the increasingly hard negatives as auxiliary variables, we formulate this process as a min-max alternating optimisation problem. At each iteration, conditioned on a given set of hard negative reports, image and report features are learned as usual by minimising the loss functions related to report generation. After that, a new set of harder negative reports will be created by maximising a loss reflecting image-report alignment. By solving this optimisation, we attain a model that can generate more specific and accurate reports. It is noteworthy that our framework enhances discriminative feature learning without introducing extra network weights. Also, in contrast to the existing way of generating hard negatives, our framework extends beyond the granularity of the dataset by generating harder samples out of the training set. Experimental study on benchmark datasets verifies the efficacy of our framework and shows that it can serve as a plug-in to readily improve existing medical report generation models.

翻译：自动放射学报告生成面临挑战，因为医学影像或报告因解剖结构的共性而通常具有高度相似性。这使得模型难以捕捉个体影像的独特性，容易生成不理想的通用性或错配报告。这一现状要求学习更具判别性的特征，以捕捉影像与报告间甚至微细粒度的错配。为此，本文提出一种新颖框架，通过区分影像与报告与其最相似样本（即难负样本）来学习判别性特征。具体而言，为获得更具判别性的特征，我们在训练过程中逐步提升学习任务的难度，在特征空间内为每张影像分别生成难度递增的负样本报告。通过将渐进式难负样本视为辅助变量，我们将该过程建模为最小-最大交替优化问题：每次迭代时，在给定难负样本报告集合的条件下，通过最小化与报告生成相关的损失函数照常学习影像与报告特征；随后通过最大化反映影像-报告对齐的损失函数，生成新的更难负样本报告。通过求解该优化问题，我们获得了能生成更特异、更准确报告的模型。值得注意的是，本框架无需引入额外网络权重即可增强判别性特征学习。此外，与现有难负样本生成方式相比，本框架通过从训练集外生成更难的样本，突破了数据集的粒度限制。基准数据集上的实验验证了本框架的有效性，并表明其可作为即插即用模块直接改进现有医学报告生成模型。