Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidate passage identifiers. Further, they are trained with the typical language modeling objective, which treats all ranking errors uniformly--potentially at the cost of misranking highly relevant passages. Addressing these limitations, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Further, we incorporate a learning-to-rank loss during training, prioritizing ranking accuracy for the more relevant passages. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Finally, to illustrate the practical effectiveness of listwise LLM rerankers, we investigate their application in providing relevance feedback for retrievers during inference. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
翻译:大型语言模型(LLM)显著推动了信息检索领域的发展,尤其在重排序任务中。与现有的监督方法相比,列表式LLM重排序器展现出更优越的性能和泛化能力。然而,传统的列表式LLM重排序方法效率较低,因其以生成有序的候选段落标识符序列的形式输出排序结果。此外,这些方法使用典型的语言建模目标进行训练,该目标对所有排序错误一视同仁——可能导致高度相关段落被错误排序的风险。为应对这些局限性,我们提出了FIRST,一种新颖的列表式LLM重排序方法,该方法利用首个生成标识符的输出逻辑值直接获取候选段落的排序顺序。进一步地,我们在训练中引入了排序学习损失函数,以优先保证高相关性段落的排序准确性。实验结果表明,FIRST在BEIR基准测试中保持稳健排序性能并取得增益的同时,将推理速度提升了50%。最后,为说明列表式LLM重排序器的实际效能,我们探究了其在推理过程中为检索器提供相关性反馈的应用。研究显示,与交叉编码器相比,LLM重排序器能提供更强的知识蒸馏信号,从而在相关性反馈后显著提升检索器的召回率。