Effective information retrieval (IR) from vast datasets relies on advanced techniques to extract relevant information in response to queries. Recent advancements in dense retrieval have showcased remarkable efficacy compared to traditional sparse retrieval methods. To further enhance retrieval performance, knowledge distillation techniques, often leveraging robust cross-encoder rerankers, have been extensively explored. However, existing approaches primarily distill knowledge from pointwise rerankers, which assign absolute relevance scores to documents, thus facing challenges related to inconsistent comparisons. This paper introduces Pairwise Relevance Distillation (PairDistill) to leverage pairwise reranking, offering fine-grained distinctions between similarly relevant documents to enrich the training of dense retrieval models. Our experiments demonstrate that PairDistill outperforms existing methods, achieving new state-of-the-art results across multiple benchmarks. This highlights the potential of PairDistill in advancing dense retrieval techniques effectively. Our source code and trained models are released at https://github.com/MiuLab/PairDistill
翻译:从海量数据集中进行有效的信息检索依赖于先进技术来提取与查询相关的信息。与传统的稀疏检索方法相比,稠密检索的最新进展已展现出显著效能。为了进一步提升检索性能,知识蒸馏技术(通常利用强大的交叉编码器重排序器)已得到广泛探索。然而,现有方法主要从点式重排序器蒸馏知识,这类方法为文档分配绝对相关性分数,因此面临比较不一致的挑战。本文提出成对相关性蒸馏(PairDistill),以利用成对重排序,在相似相关文档间提供细粒度区分,从而丰富稠密检索模型的训练。我们的实验表明,PairDistill 优于现有方法,在多个基准测试中取得了新的最先进结果。这突显了 PairDistill 在有效推进稠密检索技术方面的潜力。我们的源代码和训练模型发布于 https://github.com/MiuLab/PairDistill。