BridgeRAG: Training-Free Bridge-Conditioned Retrieval for Multi-Hop Question Answering

Multi-hop retrieval is not a single-step relevance problem: later-hop evidence should be ranked by its utility conditioned on retrieved bridge evidence, not by similarity to the original query alone. We present BridgeRAG, a training-free, graph-free retrieval method for retrieval-augmented generation (RAG) over multi-hop questions that operationalizes this view with a tripartite scorer s(q,b,c) over (question, bridge, candidate). BridgeRAG separates coverage from scoring: dual-entity ANN expansion broadens the second-hop candidate pool, while a bridge-conditioned LLM judge identifies the active reasoning chain among competing candidates without any offline graph or proposition index. Across four controlled experiments we show that this conditioning signal is (i) selective: +2.55pp on parallel-chain queries (p<0.001) vs. ~0 on single-chain subtypes; (ii) irreplaceable: substituting the retrieved passage with generated SVO query text reduces R@5 by 2.1pp, performing worse than even the lowest-SVO-similarity pool passage; (iii) predictable: cos(b,g2) correlates with per-query gain (Spearman rho=0.104, p<0.001); and (iv) mechanistically precise: bridge conditioning causes productive re-rankings (18.7% flip-win rate on parallel-chain vs. 0.6% on single-chain), not merely more churn. Combined with lightweight coverage expansion and percentile-rank score fusion, BridgeRAG achieves the best published training-free R@5 under matched benchmark evaluation on all three standard MHQA benchmarks without a graph database or any training: 0.8146 on MuSiQue (+3.1pp vs. PropRAG, +6.8pp vs. HippoRAG2), 0.9527 on 2WikiMultiHopQA (+1.2pp vs. PropRAG), and 0.9875 on HotpotQA (+1.35pp vs. PropRAG).

翻译：多跳检索并非单一阶段的关联性问题：后续证据的排序应依据其相对于已检索桥梁证据的效用性，而非仅基于其与原始查询的相似度。本文提出BridgeRAG——一种无需训练、免图结构的检索增强生成（RAG）多跳检索方法，通过三元评分函数s(q,b,c)对（查询、桥梁、候选）三元组实现该理念的具象化操作。BridgeRAG将覆盖度评估与评分分离：双实体近似最近邻（ANN）扩展拓宽第二跳候选池，同时桥梁条件化的大语言模型（LLM）裁判在不依赖离线图结构或命题索引的情况下，识别竞争候选中的活跃推理链。通过四项受控实验，我们证明该条件化信号具有以下特性：（i）选择性：在并行链查询上提升2.55个百分点（p<0.001），而在单链子类型上近乎无提升；（ii）不可替代性：将检索段落替换为生成的SVO查询文本后，R@5下降2.1个百分点，甚至低于最低SVO相似度池段落的表现；（iii）可预测性：cos(b,g2)与单查询增益存在相关性（Spearman rho=0.104，p<0.001）；（iv）机制精确性：桥梁条件化引发有效的重排序（并行链上翻转胜率为18.7%，单链上仅为0.6%），而非简单的排序波动。结合轻量级覆盖度扩展与百分位秩分数融合，BridgeRAG在无图数据库且无需任何训练的条件下，在三个标准MHQA基准测试中均达到已发表最优的无训练R@5结果：MuSiQue上为0.8146（较PropRAG提升3.1个百分点，较HippoRAG2提升6.8个百分点），2WikiMultiHopQA上为0.9527（较PropRAG提升1.2个百分点），HotpotQA上为0.9875（较PropRAG提升1.35个百分点）。