Previous LLM-based passage re-rankers are often expensive and slow because the input context constraints require the LLM to make many dependent model calls. We study how recent long-context LLMs change this problem: when the full set of retrieved candidate passages can be shown to the model at once, ranking no longer has to be reconstructed from many overlapping local comparisons. We propose Whole-Pool Setwise re-ranking, where each call considers all currently unranked candidate passages, and introduce DualEnd, which identifies both the most and least relevant passages in one call. By filling the ranking from both ends, DualEnd ranks 100 candidates with 50 serial LLM calls, compared with 99 calls for comparable one-passage-at-a-time whole-pool methods. Experiments with nine open-weight LLMs on two passage re-ranking benchmarks, measuring effectiveness, call count, token use, runtime, and output reliability shows that long context is not merely more prompt space, but an opportunity to make LLM re-rankers both effective and efficient.
翻译:以往的基于大语言模型的段落重排序方法通常成本高昂且速度缓慢,因为输入上下文限制要求大语言模型进行大量依赖性的模型调用。我们研究近期长上下文大语言模型如何改变这一问题:当所有检索到的候选段落可以同时呈现给模型时,排序不再需要通过多次重叠的局部比较来重构。我们提出全池集合重排序方法,每次调用考虑所有当前未排序的候选段落,并引入DualEnd方法,在一次调用中同时识别最相关和最不相关的段落。通过从两端填充排序,DualEnd对100个候选段落仅需50次串行大语言模型调用,而每次只输出一个段落的同类全池方法需要99次调用。在两项段落重排序基准上对九个开源大语言模型进行的实验,衡量了有效性、调用次数、令牌使用量、运行时间和输出可靠性,结果表明长上下文不仅提供更多提示空间,更是让大语言模型重排序器兼具效果与效率的契机。