Clinical decision support requires not only correct answers but also clinically valid reasoning. We propose Differential Reasoning Learning (DRL), a framework that improves clinical agents by learning from reasoning discrepancies. From reference reasoning rationales (e.g., physician-authored clinical rationale, clinical guidelines, or outputs from more capable models) and the agent's free-form chain-of-thought (CoT), DRL extracts reasoning graphs as directed acyclic graphs (DAGs) and performs a clinically weighted graph edit distance (GED)-based discrepancy analysis. An LLM-as-a-judge aligns semantically equivalent nodes and diagnoses discrepancies between graphs. These graph-level discrepancy diagnostics are converted into natural-language instructions and stored in a Differential Reasoning Knowledge Base (DR-KB). At inference, we retrieve top-$k$ instructions via Retrieval-Augmented Generation (RAG) to augment the agent prompt and patch likely logic gaps. Evaluation on open medical question answering (QA) benchmarks and a Return Visit Admissions (RVA) prediction task from internal clinical data demonstrates gains over baselines, improving both final-answer accuracy and reasoning fidelity. Ablation studies confirm gains from infusing reference reasoning rationales and the top-$k$ retrieval strategy. Clinicians' review of the output provides further assurance of the approach. Together, results suggest that DRL supports more reliable clinical decision-making in complex reasoning scenarios and offers a practical mechanism for deployment under limited token budgets.
翻译:临床决策支持不仅需要正确答案,还需要临床有效的推理过程。我们提出差分推理学习(DRL)框架,通过学习推理差异来改进临床智能体。该框架基于参考推理依据(如医师撰写的临床推理、临床指南或更强模型的输出)与智能体自由生成的思维链(CoT),将推理过程提取为有向无环图(DAG),并执行基于临床加权图编辑距离(GED)的差异分析。采用LLM作为评判器对齐语义等效节点并诊断图间差异。这些图级差异诊断被转化为自然语言指令,存储于差分推理知识库(DR-KB)中。在推理阶段,通过检索增强生成(RAG)检索top-$k$指令以增强智能体提示,修补潜在逻辑漏洞。在开放医学问答(QA)基准测试及基于内部临床数据的复诊入院(RVA)预测任务上的评估表明,本方法在基线模型基础上实现了性能提升,同时改善了最终答案准确性与推理保真度。消融实验证实了注入参考推理依据与top-$k$检索策略的增益效果。临床医师对输出的评审进一步验证了该方法的可靠性。综合结果表明,DRL能够在复杂推理场景中支持更可靠的临床决策,并为有限令牌预算下的实际部署提供了可行机制。