Two-hop QA retrieval splits queries into two regimes determined by whether the hop-2 entity is explicitly named in the question (Q-dominant) or only in the bridge passage (B-dominant). We formalize this split with three theorems: (T1) per-query AUC is a monotone function of the cosine separation margin, with R^2 >= 0.90 for six of eight type-encoder pairs; (T2) regime is characterized by two surface-text predicates, with P1 decisive for routing and P2 qualifying the B-dominant case, holding across three encoders and three datasets; and (T3) bridge advantage requires the relation-bearing sentence, not entity name alone, with removal causing an 8.6-14.1 pp performance drop (p < 0.001). Building on this theory, we propose RegimeRouter, a lightweight binary router that selects between question-only and question-plus-relation-sentence retrieval using five text features derived directly from the predicate definitions. Trained on 2WikiMultiHopQA (n = 881, 5-fold cross-fitted) and applied zero-shot to MuSiQue and HotpotQA, RegimeRouter achieves +5.6 pp (p < 0.001), +5.3 pp (p = 0.002), and +1.1 pp (non-significant, no-regret) R@5 improvement, respectively, with artifact-driven.
翻译:双跳问答检索将查询划分为两类状态:取决于跳-2实体是在问题中显式命名(Q主导型),还是仅出现在桥接段落中(B主导型)。我们通过三个定理形式化这一划分:(T1)逐查询AUC是余弦分离间隔的单调函数,在八个类型-编码器对中的六个上R²≥0.90;(T2)状态由两个表层文本谓词刻画,其中P1决定路由方向,P2限定B主导型情况,该结论在三个编码器和三个数据集上均成立;(T3)桥接优势依赖于携带关系的句子而非仅实体名称,移除该句子会导致8.6-14.1个百分点的性能下降(p<0.001)。基于该理论,我们提出RegimeRouter——一种轻量级二元路由机制,利用直接从谓词定义导出的五个文本特征,在仅问题检索与问题加关系句子检索之间进行选择。该路由在2WikiMultiHopQA上训练(n=881,5折交叉拟合),零样本迁移至MuSiQue和HotpotQA后,R@5分别提升+5.6个百分点(p<0.001)、+5.3个百分点(p=0.002)和+1.1个百分点(无统计显著性,无遗憾),且具有可解释性。