Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.
翻译:近期研究表明,基于DeepImpact模型所生成的学习型稀疏表示,由BM25驱动的动态索引跳跃策略可显著加速基于MaxScore的文档检索。本文探究在采用SPLADE、uniCOIL等其他模型进行top-k检索时,此类遍历引导策略的有效性,并发现当BM25模型与学习型权重模型对齐不良或检索深度k值较小时,无约束的BM25驱动跳跃可能导致明显的相关性退化。本文对先前工作进行归纳推广,通过提出两级剪枝控制方案与模型对齐优化策略,实现了基于稀疏表示的BM25引导式索引遍历快速检索。尽管该方案可能带来时延增加,但其在保持相关性性能的同时,仍显著快于无BM25引导的原始MaxScore方法。本文分析了两级剪枝方案的竞争性,并通过多个测试数据集检索实验评估其在排序相关性与时间效率之间的权衡关系。