Quantum Knowledge Graph: Modeling Context-Dependent Triplet Validity

Knowledge graphs (KGs) are increasingly used to support large lan guage model (LLM) reasoning, but standard triplet-based KGs treat each relation as globally valid. In many settings, whether a relation should count as evidence depends on the context. We therefore formulate triplet validity as a triplet-specific function of context and refer to this formulation as a Quantum Knowledge Graph (QKG). We instantiate QKG in medicine using a diabetes-centered PrimeKG subgraph, whose 68,651 context-sensitive relations are further annotated with patient-group-specific constraints. We evaluate it in a reasoner--validator pipeline for medical question answering on a KG-grounded subset of MedReason containing 2,788 questions. With Haiku-4.5 as both the Reasoner and the Validator, KG-backed validation significantly improves over a no-validator baseline ($+0.61$ pp), and QKG with context matching yields the largest gain, outperforming both KG validation without context matching ($+0.79$ pp) and the no-validator baseline ($+1.40$ pp; paired McNemar, all $p<0.05$). Under a stronger validator (Qwen-3.6-Plus), the raw QKG gain over the no-validator baseline grows from $+1.40$ pp to $+5.96$ pp; the context-matching gap is non-significant ($p=0.73$) on the raw set but becomes borderline significant ($p=0.05$) after adjustment for knowledge leakage and suspicious questions, consistent with a benchmark-gold ceiling rather than a QKG limitation. Taken together, the results support the view that the value of a KG in LLM-based clinical reasoning lies not merely in storing medically related facts, but in representing whether those facts are applicable to the specific patient context. For reproducibility and further research, we release the curated QKG datasets and source code.\footnote{https://github.com/HKAI-Sci/QKG}

翻译：知识图谱（KGs）日益被用于支持大语言模型（LLM）推理，但基于标准三元组的KGs将每条关系视为全局有效。在许多场景中，关系是否应被视为证据取决于上下文。因此，我们将三元组有效性建模为上下文的函数（针对每个三元组），并将此表述称为量子知识图谱（Quantum Knowledge Graph, QKG）。我们以糖尿病为中心的PrimeKG子图在医学领域实例化QKG，该子图包含68,651条上下文敏感关系，并进一步标注了患者群体特有的约束条件。我们在基于KG的MedReason子集（含2,788个问题）上，通过推理器-验证器流水线进行医学问答评估。当采用Haiku-4.5作为推理器和验证器时，基于KG的验证显著优于无验证器的基线（+0.61个百分点），而结合上下文匹配的QKG带来的增益最大，分别超越无上下文匹配的KG验证（+0.79个百分点）和无验证器基线（+1.40个百分点；配对McNemar检验，所有p<0.05）。在使用更强验证器（Qwen-3.6-Plus）时，QKG相对于无验证器基线的原始增益从+1.40个百分点增至+5.96个百分点；在原始数据集上，上下文匹配的差异不显著（p=0.73），但在调整知识泄露和可疑问题后，该差异达到临界显著（p=0.05），这更符合基准测试的天花板效应而非QKG的局限性。综合来看，实验结果支持如下观点：在基于LLM的临床推理中，KG的价值不仅在于存储医学相关事实，更在于表征这些事实是否适用于特定患者上下文。为促进可复现性和后续研究，我们发布了经过整理的QKG数据集和源代码。\footnote{https://github.com/HKAI-Sci/QKG}