HKVM-RAG: Key-Value-Separated Hypergraph Evidence Organization for Multi-Hop RAG

Multi-hop RAG poses a data-engineering problem beyond passage matching: under fixed retrieval budgets, a system must organize retrieved text into evidence units that expose answer chains. Dense retrievers score passages independently, while graph-based memories make associations explicit but often rely on pairwise or entity-centered keys that fragment multi-hop evidence. We present HKVM-RAG, a key-value-separated evidence-organization layer. It assembles answer-path hyperedges from cached passage-level LLM evidence tuples and uses them as retrieval keys, while retaining passage text as answer values. To isolate key-space design, our fixed-substrate protocol holds the tuple cache, candidate passages, reader, and evaluation budget constant across pairwise graph and hypergraph variants. Weighted hypergraph key-value retrieval improves over KG-PPR by +3.426 F1 on 2WikiMultiHopQA and +3.592 F1 on MuSiQue; HotpotQA shows that higher structured support coverage need not yield standalone answer-F1 gains. We therefore study WHG-KV as an evidence-control signal rather than a dense-retrieval replacement. Oracle and train-to-dev analyses identify support selection as repairable, and a dense-aware controller combines frozen ColBERTv2 and HKVM rank/score features using out-of-fold HKVM predictions. It reaches 88.846, 65.073, and 85.810 F1 on the three benchmarks, improving over ColBERTv2 by +11.084, +6.763, and +5.966 F1. Source-level ablations show that matched non-WHG structured signals do not match the WHG-KV gains. These results provide bounded evidence that key-value-separated hypergraph organization can serve as a reusable evidence-control mechanism for multi-hop RAG.

翻译：多跳RAG提出了一个超越段落匹配的数据工程问题：在固定检索预算下，系统必须将检索到的文本组织成能够揭示推理链的证据单元。稠密检索器独立评估段落得分，而基于图的记忆结构虽能显式关联信息，但通常依赖成对或实体为中心的键，导致多跳证据碎片化。本文提出HKVM-RAG——一种键值分离的证据组织层。该方法从缓存的段落级LLM证据元组中组装答案路径超边，并将其用作检索键，同时保留段落文本作为答案值。为隔离键空间设计的影响，我们的固定基底协议保持元组缓存、候选段落、阅读器和评估预算在成对图与超图变体间恒定不变。加权超图键值检索在2WikiMultiHopQA上相较KG-PPR提升+3.426 F1，在MuSiQue上提升+3.592 F1；HotpotQA实验表明，更高的结构化支持覆盖率未必能带来独立的答案F1增益。因此，我们将WHG-KV视为一种证据控制信号而非稠密检索的替代方案。Oracle分析与训练-部署对比揭示可修复的支持选择问题，一种稠密感知控制器利用折叠外HKVM预测，融合冻结的ColBERTv2与HKVM排序/得分特征。该方法在三个基准测试上分别达到88.846、65.073和85.810 F1，相较ColBERTv2提升+11.084、+6.763和+5.966 F1。源级消融实验表明，匹配的非WHG结构化信号无法复现WHG-KV的增益效果。这些结果为键值分离超图组织可作为多跳RAG的可复用证据控制机制提供了有限证据。