Recent studies in medical question answering (Medical QA) have actively explored the integration of large language models (LLMs) with biomedical knowledge graphs (KGs) to improve factual accuracy. However, most existing approaches still rely on traversing the entire KG or performing large-scale retrieval, which introduces substantial noise and leads to unstable multi-hop reasoning. We argue that the core challenge lies not in expanding access to knowledge, but in identifying and reasoning over the appropriate subset of evidence for each query. ReGraM is a region-first knowledge graph reasoning framework that addresses this challenge by constructing a query-aligned subgraph and performing stepwise reasoning constrained to this localized region under multiple evidence aware modes. By focusing inference on only the most relevant portion of the KG, ReGraM departs from the assumption that all relations are equally useful an assumption that rarely holds in domain-specific medical settings. Experiments on seven medical QA benchmarks demonstrate that ReGraM consistently outperforms a strong baseline (KGARevion), achieving an 8.04% absolute accuracy gain on MCQ, a 4.50% gain on SAQ, and a 42.9% reduction in hallucination rate. Ablation and qualitative analyses further show that aligning region construction with hop-wise reasoning is the primary driver of these improvements. Overall, our results highlight region-first KG reasoning as an effective paradigm for improving factual accuracy and consistency in medical QA.
翻译:近期医学问答(Medical QA)研究积极探索将大语言模型(LLM)与生物医学知识图谱(KG)相结合以提高事实准确性。然而,现有方法大多仍依赖于遍历整个知识图谱或进行大规模检索,这会引入大量噪声并导致不稳定的多跳推理。我们认为核心挑战不在于扩展知识获取范围,而在于为每个查询识别并推理出合适的证据子集。ReGraM是一个区域优先的知识图谱推理框架,通过构建查询对齐的子图,并在多种证据感知模式下约束于该局部区域进行逐步推理,以应对这一挑战。通过将推理聚焦于知识图谱中最相关的部分,ReGraM摒弃了“所有关系同等重要”的假设——这一假设在特定领域的医学场景中很少成立。在七个医学问答基准测试上的实验表明,ReGraM始终优于强基线方法(KGARevion),在MCQ上实现了8.04%的绝对准确率提升,在SAQ上提升了4.50%,并将幻觉率降低了42.9%。消融实验与定性分析进一步表明,区域构建与逐跳推理的对齐是这些改进的主要驱动因素。总体而言,我们的研究结果凸显了区域优先知识图谱推理作为提升医学问答事实准确性与一致性的有效范式。