HKVM-RAG: Key-Value-Separated Hypergraph Evidence Organization for Multi-Hop RAG

Multi-hop RAG poses a data-engineering problem beyond passage matching: under fixed retrieval budgets, a system must organize retrieved text into evidence units that expose answer chains. Dense retrievers score passages independently, while graph-based memories make associations explicit but often rely on pairwise or entity-centered keys that fragment multi-hop evidence. We present HKVM-RAG, a key-value-separated evidence-organization layer. It assembles answer-path hyperedges from cached passage-level LLM evidence tuples and uses them as retrieval keys, while retaining passage text as answer values. To isolate key-space design, our fixed-substrate protocol holds the tuple cache, candidate passages, reader, and evaluation budget constant across pairwise graph and hypergraph variants. Weighted hypergraph key-value retrieval improves over KG-PPR by +3.426 F1 on 2WikiMultiHopQA and +3.592 F1 on MuSiQue; HotpotQA shows that higher structured support coverage need not yield standalone answer-F1 gains. We therefore study WHG-KV as an evidence-control signal rather than a dense-retrieval replacement. Oracle and train-to-dev analyses identify support selection as repairable, and a dense-aware controller combines frozen ColBERTv2 and HKVM rank/score features using out-of-fold HKVM predictions. It reaches 88.846, 65.073, and 85.810 F1 on the three benchmarks, improving over ColBERTv2 by +11.084, +6.763, and +5.966 F1. Source-level ablations show that matched non-WHG structured signals do not match the WHG-KV gains. These results provide bounded evidence that key-value-separated hypergraph organization can serve as a reusable evidence-control mechanism for multi-hop RAG.

翻译：摘要：多跳RAG提出了一个超越段落匹配的数据工程问题：在固定检索预算下，系统必须将检索到的文本组织成能够揭示答案链的证据单元。密集检索器独立对段落评分，而基于图的记忆虽能显式表示关联关系，但通常依赖成对或实体中心的键，导致多跳证据碎片化。我们提出HKVM-RAG——一种键值分离的证据组织层。该方法从缓存的段落级大语言模型证据元组中组装答案路径超边，将其作为检索键，同时保留段落原文作为答案值。为隔离键空间设计，我们的固定基底协议在成对图与超图变体间保持元组缓存、候选段落、阅读器及评估预算不变。加权超图键值检索在2WikiMultiHopQA上相较KG-PPR提升+3.426 F1，在MuSiQue上提升+3.592 F1；HotpotQA实验表明更高的结构化支持覆盖率未必带来独立的答案F1增益。因此我们将WHG-KV作为证据控制信号而非密集检索替代方案。Oracle与训练-开发分析显示支持选择问题可修复，基于密集感知控制器融合冻结的ColBERTv2与HKVM排名/评分特征（利用折外HKVM预测）。该控制器在三个基准上分别达到88.846、65.073和85.810 F1，相较ColBERTv2分别提升+11.084、+6.763和+5.966 F1。源级消融实验表明，匹配的非WHG结构化信号无法复现WHG-KV的增益。这些结果为键值分离超图组织作为多跳RAG可复用证据控制机制提供了有界证据。