Recent reinforcement learning (RL) approaches have advanced radiology report generation (RRG), yet two core limitations persist: (1) report-level rewards offer limited evidence-grounded guidance for clinical faithfulness; and (2) current methods lack an explicit self-improving mechanism to align with clinical preference. We introduce clinically aligned Evidence-aware Self-Correcting Reinforcement Learning (ESC-RL), comprising two key components. First, a Group-wise Evidence-aware Alignment Reward (GEAR) delivers group-wise, evidence-aware feedback. GEAR reinforces consistent grounding for true positives, recovers missed findings for false negatives, and suppresses unsupported content for false positives. Second, a Self-correcting Preference Learning (SPL) strategy automatically constructs a reliable, disease-aware preference dataset from multiple noisy observations and leverages an LLM to synthesize refined reports without human supervision. ESC-RL promotes clinically faithful, disease-aligned reward and supports continual self-improvement during training. Extensive experiments on two public chest X-ray datasets demonstrate consistent gains and state-of-the-art performance.
翻译:最近的强化学习方法推动了放射学报告生成领域的发展,但仍存在两个核心局限:(1) 报告级奖励对临床准确性提供的基于证据的指导有限;(2) 当前方法缺乏明确的自我改进机制以对齐临床偏好。我们提出临床对齐的、证据感知的自纠正强化学习(ESC-RL),包含两个关键组件。首先,分组证据感知对齐奖励(GEAR)提供分组级、证据感知的反馈:对真阳性结果强化一致性证据支撑,对假阴性结果恢复遗漏发现,对假阳性结果抑制无依据内容。其次,自纠正偏好学习(SPL)策略自动从多个含噪声观测构建可靠的疾病感知偏好数据集,并利用大型语言模型在没有人工监督的情况下合成优化报告。ESC-RL促进临床可靠、疾病对齐的奖励,并支持训练过程中的持续自我改进。在两个公开胸部X光数据集上的大量实验表明,该方法取得持续提升并达到最优性能。