Current medical retrieval-augmented generation (RAG) approaches overlook evidence-based medicine (EBM) principles, leading to two key gaps: (1) the lack of PICO alignment between queries and retrieved evidence, and (2) the absence of evidence hierarchy considerations during reranking. We present SR-RAG, an EBM-adapted GraphRAG framework that integrates the PICO framework into knowledge graph construction and retrieval, and proposes Bayesian Evidence Tier Reranking (BETR) to calibrate ranking scores by evidence grade without predefined weights. Validated in sports rehabilitation, we release a knowledge graph (357,844 nodes, 371,226 edges) and a benchmark of 1,637 QA pairs. SR-RAG achieves 0.812 evidence recall@10, 0.830 nugget coverage, 0.819 answer faithfulness, 0.882 semantic similarity, and 0.788 PICOT match accuracy, substantially outperforming five baselines. Five expert clinicians rated the system 4.66--4.84 on a 5-point Likert scale, and system rankings are preserved on a human-verified gold subset (n=80).
翻译:当前医学检索增强生成方法忽视了循证医学原则,导致两个关键缺陷:(1)查询与检索证据之间缺乏PICO对齐;(2)重排序过程中未考虑证据层级。我们提出SR-RAG——一种适应循证医学的图检索增强生成框架,该框架将PICO框架融入知识图谱构建与检索过程,并提出贝叶斯证据层级重排序以依据证据等级校准排序分数,无需预设权重。经运动康复领域验证,我们发布了包含357,844个节点与371,226条边的知识图谱,以及由1,637个问答对组成的基准测试集。SR-RAG在证据召回率@10(0.812)、关键信息覆盖率(0.830)、答案忠实度(0.819)、语义相似度(0.882)和PICOT匹配准确率(0.788)上显著优于五个基线系统。五位临床专家在5点李克特量表上对系统评分为4.66–4.84,且在人工验证的金标准子集(n=80)上系统排名保持不变。