Radiology Report Generation (RRG) aims to produce accurate and coherent diagnostics from medical images. Although large vision language models (LVLM) improve report fluency and accuracy, they exhibit hallucinations, generating plausible yet image-ungrounded pathological details. Existing methods primarily rely on external knowledge guidance to facilitate the alignment between generated text and visual information. However, these approaches often ignore the inherent decoding priors and vision-language alignment biases in pretrained models and lack robustness due to reliance on constructed guidance. In this paper, we propose Layer-wise Expert-aligned Decoding (LEAD), a novel method to inherently modify the LVLM decoding trajectory. A multiple experts module is designed for extracting distinct pathological features which are integrated into each decoder layer via a gating mechanism. This layer-wise architecture enables the LLM to consult expert features at every inference step via a learned gating function, thereby dynamically rectifying decoding biases and steering the generation toward factual consistency. Experiments conducted on multiple public datasets demonstrate that the LEAD method yields effective improvements in clinical accuracy metrics and mitigates hallucinations while preserving high generation quality.
翻译:放射学报告生成旨在从医学图像中生成准确且连贯的诊断报告。尽管大型视觉语言模型提升了报告的流畅性与准确性,但其存在幻觉问题,会生成看似合理但缺乏图像依据的病理细节。现有方法主要依赖外部知识引导来促进生成文本与视觉信息的对齐,然而这些方法往往忽略了预训练模型中固有的解码先验与视觉-语言对齐偏差,且因依赖构建的引导信息而缺乏鲁棒性。本文提出层级专家对齐解码方法,这是一种从根本上修正大型视觉语言模型解码轨迹的新方法。我们设计了多专家模块以提取不同的病理特征,并通过门控机制将其整合到每个解码器层中。这种层级架构使大语言模型能够在每个推理步骤中通过学习的门控函数参考专家特征,从而动态修正解码偏差,并将生成过程导向事实一致性。在多个公开数据集上进行的实验表明,LEAD方法在临床准确性指标上实现了有效提升,在保持高质量生成的同时显著缓解了幻觉现象。