IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer

Automated medical report generation has become increasingly important in medical analysis. It can produce computer-aided diagnosis descriptions and thus significantly alleviate the doctors' work. Inspired by the huge success of neural machine translation and image captioning, various deep learning methods have been proposed for medical report generation. However, due to the inherent properties of medical data, including data imbalance and the length and correlation between report sequences, the generated reports by existing methods may exhibit linguistic fluency but lack adequate clinical accuracy. In this work, we propose an image-to-indicator hierarchical transformer (IIHT) framework for medical report generation. It consists of three modules, i.e., a classifier module, an indicator expansion module and a generator module. The classifier module first extracts image features from the input medical images and produces disease-related indicators with their corresponding states. The disease-related indicators are subsequently utilised as input for the indicator expansion module, incorporating the "data-text-data" strategy. The transformer-based generator then leverages these extracted features along with image features as auxiliary information to generate final reports. Furthermore, the proposed IIHT method is feasible for radiologists to modify disease indicators in real-world scenarios and integrate the operations into the indicator expansion module for fluent and accurate medical report generation. Extensive experiments and comparisons with state-of-the-art methods under various evaluation metrics demonstrate the great performance of the proposed method.

翻译：自动化医学报告生成在医学分析中日益重要。它能够生成计算机辅助诊断描述，从而显著减轻医生的工作负担。受神经机器翻译和图像描述生成巨大成功的启发，研究者提出了多种深度学习方法用于医学报告生成。然而，由于医学数据固有的特性（包括数据不平衡、报告序列长度及其相关性），现有方法生成的报告可能语言流畅但缺乏足够的临床准确性。本文提出了一种基于图像到指标层次化Transformer（IIHT）框架的医学报告生成方法。该框架包含三个模块：分类器模块、指标扩展模块和生成器模块。分类器模块首先从输入医学图像中提取图像特征，生成疾病相关指标及其对应状态。随后，疾病相关指标被输入到指标扩展模块，该模块采用"数据-文本-数据"策略。基于Transformer的生成器利用这些提取的特征及图像特征作为辅助信息，生成最终报告。此外，所提IIHT方法支持放射科医生在实际场景中修改疾病指标，并将操作整合到指标扩展模块中，从而实现流畅且准确的医学报告生成。在多种评估指标下与最先进方法进行的大量实验和比较表明，所提方法具有优异性能。