In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code.
翻译:本文提出TWOLAR:一种基于大语言模型(LLM)知识蒸馏的两阶段段落重排序流水线。TWOLAR引入了一种新的评分策略和蒸馏过程,该过程通过构建新颖且多样化的训练数据集实现。数据集包含2万个查询,每个查询关联一组经由四种不同检索方法获取的文档(以确保多样性),随后利用大语言模型的零样本重排序能力对这些文档进行重排。我们的消融实验证明了每个新增组件的贡献。实验结果表明,TWOLAR显著提升了基础模型的文档重排序能力,在TREC-DL测试集和零样本评估基准BEIR上,其性能可与参数数量多三个数量级的现有最优模型相媲美,甚至在某些情况下表现更优。为促进后续研究,我们公开了数据集、微调模型及代码。