Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives

Automatic radiology report generation is challenging as medical images or reports are usually similar to each other due to the common content of anatomy. This makes a model hard to capture the uniqueness of individual images and is prone to producing undesired generic or mismatched reports. This situation calls for learning more discriminative features that could capture even fine-grained mismatches between images and reports. To achieve this, this paper proposes a novel framework to learn discriminative image and report features by distinguishing them from their closest peers, i.e., hard negatives. Especially, to attain more discriminative features, we gradually raise the difficulty of such a learning task by creating increasingly hard negative reports for each image in the feature space during training, respectively. By treating the increasingly hard negatives as auxiliary variables, we formulate this process as a min-max alternating optimisation problem. At each iteration, conditioned on a given set of hard negative reports, image and report features are learned as usual by minimising the loss functions related to report generation. After that, a new set of harder negative reports will be created by maximising a loss reflecting image-report alignment. By solving this optimisation, we attain a model that can generate more specific and accurate reports. It is noteworthy that our framework enhances discriminative feature learning without introducing extra network weights. Also, in contrast to the existing way of generating hard negatives, our framework extends beyond the granularity of the dataset by generating harder samples out of the training set. Experimental study on benchmark datasets verifies the efficacy of our framework and shows that it can serve as a plug-in to readily improve existing medical report generation models.

翻译：自动放射学报告生成具有挑战性，因为医学图像或报告通常因解剖结构的共性而彼此相似。这导致模型难以捕捉个体图像的独特性，容易产生不符合预期的通用或不匹配报告。针对这一现状，需要学习更具判别性的特征，以捕捉图像与报告之间甚至细粒度的不匹配。为此，本文提出一种新颖框架，通过区分图像/报告与其最相似样本（即难负样本）来学习判别性特征。具体而言，为获得更具判别性的特征，我们通过在训练过程中逐步为每张图像在特征空间中创建渐进的难负样本报告，逐步提升学习任务的难度。将渐进式难负样本视为辅助变量，我们将该过程建模为极小极大交替优化问题。在每次迭代中，基于给定的难负样本报告集，通过最小化与报告生成相关的损失函数来常规学习图像和报告特征。随后，通过最大化反映图像-报告对齐程度的损失来创建更难的负样本报告集。通过求解该优化问题，我们获得能生成更具特异性和准确性报告的模型。值得注意的是，本框架无需引入额外网络权重即可增强判别性特征学习。此外，与现有生成难负样本的方法相比，本框架通过生成训练集外的更困难样本，突破了数据集的粒度限制。在基准数据集上的实验研究验证了本框架的有效性，并表明其可作为即插即用模块直接改进现有医学报告生成模型。