概念增强多模态RAG：迈向可解释且准确的放射学报告生成 (Concept-Enhanced Multimodal RAG: Towards Interpretable and Accurate Radiology Report Generation)

Radiology Report Generation (RRG) through Vision-Language Models (VLMs) promises to reduce documentation burden, improve reporting consistency, and accelerate clinical workflows. However, their clinical adoption remains limited by the lack of interpretability and the tendency to hallucinate findings misaligned with imaging evidence. Existing research typically treats interpretability and accuracy as separate objectives, with concept-based explainability techniques focusing primarily on transparency, while Retrieval-Augmented Generation (RAG) methods targeting factual grounding through external retrieval. We present Concept-Enhanced Multimodal RAG (CEMRAG), a unified framework that decomposes visual representations into interpretable clinical concepts and integrates them with multimodal RAG. This approach exploits enriched contextual prompts for RRG, improving both interpretability and factual accuracy. Experiments on MIMIC-CXR and IU X-Ray across multiple VLM architectures, training regimes, and retrieval configurations demonstrate consistent improvements over both conventional RAG and concept-only baselines on clinical accuracy metrics and standard NLP measures. These results challenge the assumed trade-off between interpretability and performance, showing that transparent visual concepts can enhance rather than compromise diagnostic accuracy in medical VLMs. Our modular design decomposes interpretability into visual transparency and structured language model conditioning, providing a principled pathway toward clinically trustworthy AI-assisted radiology.

翻译：通过视觉语言模型（VLM）实现放射学报告生成（RRG）有望减轻文档负担、提高报告一致性并加速临床工作流程。然而，其临床采用仍因缺乏可解释性以及倾向于生成与影像证据不符的幻觉性发现而受到限制。现有研究通常将可解释性与准确性视为独立目标：基于概念的可解释性技术主要关注透明度，而检索增强生成（RAG）方法则通过外部检索追求事实依据。本文提出概念增强多模态RAG（CEMRAG），该统一框架将视觉表征分解为可解释的临床概念，并将其与多模态RAG相融合。该方法利用增强的上下文提示进行RRG，从而同时提升可解释性与事实准确性。在MIMIC-CXR和IU X-Ray数据集上，通过多种VLM架构、训练方案和检索配置进行的实验表明，该框架在临床准确性指标和标准NLP度量上均持续优于传统RAG和纯概念基线方法。这些结果挑战了可解释性与性能之间存在权衡的固有认知，证明透明的视觉概念能够提升而非损害医学VLM的诊断准确性。我们的模块化设计将可解释性分解为视觉透明度和结构化语言模型调节，为建立临床可信赖的AI辅助放射学系统提供了原则性路径。