A radiology report comprises presentation-style vocabulary, which ensures clarity and organization, and factual vocabulary, which provides accurate and objective descriptions based on observable findings. While manually writing these reports is time-consuming and labor-intensive, automatic report generation offers a promising alternative. A critical step in this process is to align radiographs with their corresponding reports. However, existing methods often rely on complete reports for alignment, overlooking the impact of presentation-style vocabulary. To address this issue, we propose FSE, a two-stage Factual Serialization Enhancement method. In Stage 1, we introduce factuality-guided contrastive learning for visual representation by maximizing the semantic correspondence between radiographs and corresponding factual descriptions. In Stage 2, we present evidence-driven report generation that enhances diagnostic accuracy by integrating insights from similar historical cases structured as factual serialization. Experiments on MIMIC-CXR and IU X-ray datasets across specific and general scenarios demonstrate that FSE outperforms state-of-the-art approaches in both natural language generation and clinical efficacy metrics. Ablation studies further emphasize the positive effects of factual serialization in Stage 1 and Stage 2. The code is available at https://github.com/mk-runner/FSE.
翻译:放射学报告包含呈现式词汇(确保清晰度和条理性)和事实性词汇(基于可观察发现提供准确客观的描述)。虽然人工撰写此类报告耗时费力,但自动报告生成提供了一种前景广阔的替代方案。该过程中的关键步骤是将X光片与其对应报告对齐。然而,现有方法通常依赖完整报告进行对齐,忽略了呈现式词汇的影响。为解决此问题,我们提出了FSE——一种两阶段的事实序列化增强方法。在第一阶段,我们通过最大化X光片与对应事实描述之间的语义对应性,引入事实性引导的对比学习以获取视觉表征。在第二阶段,我们提出证据驱动的报告生成方法,通过整合来自类似历史病例(以事实序列化形式组织)的洞察来提升诊断准确性。在MIMIC-CXR和IU X-ray数据集上针对特定场景和通用场景的实验表明,FSE在自然语言生成和临床效能指标上均优于现有先进方法。消融研究进一步证实了事实序列化在第一阶段和第二阶段的积极作用。代码发布于https://github.com/mk-runner/FSE。