Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing. Although various image caption methods have shown remarkable performance in the natural image field, generating accurate reports for medical images requires knowledge of multiple modalities, including vision, language, and medical terminology. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. In detail, a U-connection schema between the encoder and decoder is designed to model interactions between different modalities. And a symptom graph and an injected knowledge distiller are developed to assist the report generation. Experimentally, we outperform state-of-the-art methods on two widely used benchmark datasets: IU-Xray and MIMIC-CXR. Further experimental results prove the advantages of our architecture and the complementary benefits of the injected knowledge.
翻译:摘要:放射学报告生成旨在从X光图像中自动生成临床准确且连贯的段落,从而减轻放射科医生撰写报告的沉重负担。尽管各类图像描述方法在自然图像领域已展现出卓越性能,但为医学图像生成准确报告仍需融合多模态知识,包括视觉、语言及医学术语。我们提出一种知识注入型U型Transformer(KiUT),通过学习多层级视觉表征,并利用上下文与临床知识自适应提取信息,实现词汇预测。具体而言,我们在编码器与解码器间设计了U型连接架构,以建模不同模态间的交互作用;同时开发了症状图谱与知识注入蒸馏器辅助报告生成。实验表明,该方法在两个广泛使用的基准数据集——IU-Xray与MIMIC-CXR上均超越了现有最优方法。进一步的实验结果验证了我们架构的优势以及注入知识的互补增益。