Learned sparse retrieval, which can efficiently perform retrieval through mature inverted-index engines, has garnered growing attention in recent years. Particularly, the inference-free sparse retrievers are attractive as they eliminate online model inference in the retrieval phase thereby avoids huge computational cost, offering reasonable throughput and latency. However, even the state-of-the-art (SOTA) inference-free sparse models lag far behind in terms of search relevance when compared to both sparse and dense siamese models. Towards competitive search relevance for inference-free sparse retrievers, we argue that they deserve dedicated training methods other than using same ones with siamese encoders. In this paper, we propose two different approaches for performance improvement. First, we introduce the IDF-aware FLOPS loss, which introduces Inverted Document Frequency (IDF) to the sparsification of representations. We find that it mitigates the negative impact of the FLOPS regularization on search relevance, allowing the model to achieve a better balance between accuracy and efficiency. Moreover, we propose a heterogeneous ensemble knowledge distillation framework that combines siamese dense and sparse retrievers to generate supervisory signals during the pre-training phase. The ensemble framework of dense and sparse retriever capitalizes on their strengths respectively, providing a strong upper bound for knowledge distillation. To concur the diverse feedback from heterogeneous supervisors, we normalize and then aggregate the outputs of the teacher models to eliminate score scale differences. On the BEIR benchmark, our model outperforms existing SOTA inference-free sparse model by \textbf{3.3 NDCG@10 score}. It exhibits search relevance comparable to siamese sparse retrievers and client-side latency only \textbf{1.1x that of BM25}.
翻译:学习稀疏检索能够通过成熟的倒排索引引擎高效执行检索,近年来受到越来越多的关注。特别是无推理稀疏检索器,因其在检索阶段消除了在线模型推理,从而避免了巨大的计算成本,提供了合理的吞吐量和延迟,因而极具吸引力。然而,即使是最先进的无推理稀疏模型,在搜索相关性方面也远远落后于稀疏和稠密的孪生编码器模型。为了使无推理稀疏检索器获得竞争性的搜索相关性,我们认为它们需要不同于孪生编码器的专用训练方法。在本文中,我们提出了两种不同的性能改进方法。首先,我们引入了IDF感知的FLOPS损失,它将逆文档频率引入到表示的稀疏化过程中。我们发现这减轻了FLOPS正则化对搜索相关性的负面影响,使模型能在准确性和效率之间取得更好的平衡。此外,我们提出了一种异构集成知识蒸馏框架,该框架结合了孪生稠密检索器和稀疏检索器,在预训练阶段生成监督信号。稠密与稀疏检索器的集成框架分别利用了它们各自的优势,为知识蒸馏提供了一个强大的上界。为了整合来自异构监督器的多样化反馈,我们对教师模型的输出进行归一化,然后进行聚合,以消除分数尺度差异。在BEIR基准测试中,我们的模型在NDCG@10分数上超越了现有最先进的无推理稀疏模型 \textbf{3.3分}。它展现出与孪生稀疏检索器相当的搜索相关性,而其客户端延迟仅为BM25的 \textbf{1.1倍}。