In the realm of medical report generation (MRG), the integration of natural language processing has emerged as a vital tool to alleviate the workload of radiologists. Despite the impressive capabilities demonstrated by large vision language models (LVLMs) in understanding natural language, their susceptibility to generating plausible yet inaccurate claims, known as ``hallucinations'', raises concerns-especially in the nuanced and critical field of medical. In this work, we introduce a framework, \textbf{K}nowledge-\textbf{E}nhanced with Fine-Grained \textbf{R}einforced Rewards \textbf{M}edical Report Generation (KERM), to tackle the issue. Our approach refines the input to the LVLM by first utilizing MedCLIP for knowledge retrieval, incorporating relevant lesion fact sentences from a curated knowledge corpus. We then introduce a novel purification module to ensure the retrieved knowledge is contextually relevant to the patient's clinical context. Subsequently, we employ fine-grained rewards to guide these models in generating highly supportive and clinically relevant descriptions, ensuring the alignment of model's outputs with desired behaviors. Experimental results on IU-Xray and MIMIC-CXR datasets validate the effectiveness of our approach in mitigating hallucinations and enhancing report quality.
翻译:在医学报告生成(MRG)领域,自然语言处理的整合已成为减轻放射科医师工作负担的重要工具。尽管大型视觉语言模型(LVLMs)在理解自然语言方面展现出令人印象深刻的能力,但其倾向于生成看似合理实则不准确的陈述(即“幻觉”)的问题引发担忧——尤其是在精细且关键的医学领域。本研究提出一个框架——基于细粒度强化奖励的知识增强医学报告生成(KERM),以解决该问题。我们的方法通过首先利用MedCLIP从精选知识库中检索相关病灶事实语句来增强LVLM的输入知识。随后引入新型净化模块,确保检索到的知识与患者临床情境具有上下文相关性。接着,我们采用细粒度奖励机制引导这些模型生成高度支持性且临床相关的描述,确保模型输出与预期行为保持一致。在IU-Xray和MIMIC-CXR数据集上的实验结果验证了该方法在缓解幻觉和提升报告质量方面的有效性。