Visual Grounding of Whole Radiology Reports for 3D CT Images

Building a large-scale training dataset is an essential problem in the development of medical image recognition systems. Visual grounding techniques, which automatically associate objects in images with corresponding descriptions, can facilitate labeling of large number of images. However, visual grounding of radiology reports for CT images remains challenging, because so many kinds of anomalies are detectable via CT imaging, and resulting report descriptions are long and complex. In this paper, we present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types. Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring. The anatomical segmentation provides multiple organ masks of given CT images, and helps the grounding model recognize detailed anatomies. The report structuring helps to accurately extract information regarding the presence, location, and type of each anomaly described in corresponding reports. Given the two additional image/report features, the grounding model can achieve better localization. In the verification process, we constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients. We evaluated our framework using grounding accuracy, the percentage of correctly localized anomalies, as a metric and demonstrated that the combination of the anatomical segmentation and the report structuring improves the performance with a large margin over the baseline model (66.0% vs 77.8%). Comparison with the prior techniques also showed higher performance of our method.

翻译：构建大规模训练数据集是医学图像识别系统发展中的关键问题。视觉定位技术能够自动关联图像中的对象与对应描述，从而促进大规模图像的标注工作。然而，针对CT影像的放射学报告视觉定位仍面临挑战，因为CT成像可检测的异常种类繁多，且生成的报告描述冗长复杂。本文首次提出了适用于CT影像与报告对的视觉定位框架，该框架覆盖多种身体部位和多样异常类型。我们的框架整合了两大组件：1）图像解剖分割，2）报告结构化。解剖分割可为给定CT影像提供多器官掩膜，帮助定位模型识别精细解剖结构；报告结构化则能精确提取对应报告中关于每种异常存在与否、位置及类型的描述信息。借助这两项增强的图像/报告特征，定位模型可实现更优的异常定位。在验证过程中，我们构建了包含7,321名患者10,410项检查的区域-描述对应标注的大规模数据集。采用定位准确率（即正确定位异常的比例）作为评估指标，实验表明，解剖分割与报告结构化的结合使模型性能较基线模型实现了大幅提升（66.0% 对比 77.8%）。与现有技术的比较也证实了本方法的优越性。