Learned sparse retrieval (LSR) is a popular method for first-stage retrieval because it combines the semantic matching of language models with efficient CPU-friendly algorithms. Previous work aggregates blocks into "superblocks" to quickly skip the visitation of blocks during query processing by using an advanced pruning heuristic. This paper proposes a simple and effective superblock pruning scheme that reduces the overhead of superblock score computation while preserving competitive relevance. It combines this scheme with a compact index structure and a robust zero-shot configuration that is effective across LSR models and multiple datasets. This paper provides an analytical justification and evaluation on the MS MARCO and BEIR datasets, demonstrating that the proposed scheme can be a strong alternative for efficient sparse retrieval.
翻译:学习型稀疏检索(LSR)因其结合了语言模型的语义匹配能力与高效的CPU友好算法,已成为首阶段检索的常用方法。先前研究通过高级剪枝启发式方法将数据块聚合成“超块”,以在查询处理过程中快速跳过对数据块的访问。本文提出了一种简单有效的超块剪枝方案,在保持竞争力相关性的同时降低了超块分数计算的开销。该方案与紧凑的索引结构及鲁棒的零样本配置相结合,适用于多种LSR模型与数据集。本文通过理论分析及在MS MARCO和BEIR数据集上的评估证明,所提方案可作为高效稀疏检索的有力替代方案。