The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists. To alleviate the workload of radiologists and reduce repetitive human labor in impression writing, many researchers have focused on automatic impression generation. However, recent works on this task mainly summarize the corresponding findings and pay less attention to the radiology images. In clinical, radiographs can provide more detailed valuable observations to enhance radiologists' impression writing, especially for complicated cases. Besides, each sentence in findings usually focuses on single anatomy, so they only need to be matched to corresponding anatomical regions instead of the whole image, which is beneficial for textual and visual features alignment. Therefore, we propose a novel anatomy-enhanced multimodal model to promote impression generation. In detail, we first construct a set of rules to extract anatomies and put these prompts into each sentence to highlight anatomy characteristics. Then, two separate encoders are applied to extract features from the radiograph and findings. Afterward, we utilize a contrastive learning module to align these two representations at the overall level and use a co-attention to fuse them at the sentence level with the help of anatomy-enhanced sentence representation. Finally, the decoder takes the fused information as the input to generate impressions. The experimental results on two benchmark datasets confirm the effectiveness of the proposed method, which achieves state-of-the-art results.
翻译:诊断印象是临床医生掌握关键信息的重要依据,因为它源自放射科医生对影像征象的总结与推理。为减轻放射科医生的工作负担并减少人工撰写印象的重复劳动,众多研究者致力于自动印象生成研究。然而,当前相关工作主要聚焦于影像报告的文本摘要,对放射影像本身的关注不足。在临床实践中,放射影像能为复杂病例提供更详实的观察依据以辅助印象撰写。此外,影像报告中的每个句子通常对应单一解剖结构,因此仅需匹配相应解剖区域而非整幅图像,这有利于文本与视觉特征的语义对齐。为此,我们提出了一种创新的解剖增强多模态模型用于提升印象生成质量。具体而言,首先构建解剖结构提取规则,并将解剖提示嵌入每个句子以突出解剖特征;随后采用双编码器分别提取放射影像和影像报告的特征;接着通过对比学习模块在全局层面对齐双模态表征,并借助解剖增强句子表征,采用协同注意力机制在句子层面进行特征融合;最终解码器基于融合信息生成诊断印象。在两个基准数据集上的实验验证了所提方法的有效性,并取得了最优性能。