Large Language Models (LLMs) have significantly impacted many facets of natural language processing and information retrieval. Unlike previous encoder-based approaches, the enlarged context window of these generative models allows for ranking multiple documents at once, commonly called list-wise ranking. However, there are still limits to the number of documents that can be ranked in a single inference of the model, leading to the broad adoption of a sliding window approach to identify the k most relevant items in a ranked list. We argue that the sliding window approach is not well-suited for list-wise re-ranking because it (1) cannot be parallelized in its current form, (2) leads to redundant computational steps repeatedly re-scoring the best set of documents as it works its way up the initial ranking, and (3) prioritizes the lowest-ranked documents for scoring rather than the highest-ranked documents by taking a bottom-up approach. Motivated by these shortcomings and an initial study that shows list-wise rankers are biased towards relevant documents at the start of their context window, we propose a novel algorithm that partitions a ranking to depth k and processes documents top-down. Unlike sliding window approaches, our algorithm is inherently parallelizable due to the use of a pivot element, which can be compared to documents down to an arbitrary depth concurrently. In doing so, we reduce the number of expected inference calls by around 33% when ranking at depth 100 while matching the performance of prior approaches across multiple strong re-rankers.
翻译:大语言模型(LLMs)已对自然语言处理与信息检索的诸多领域产生显著影响。与以往基于编码器的方法不同,这类生成模型扩展的上下文窗口允许一次性对多篇文档进行排序,即通常所称的列表排序。然而,模型单次推理所能处理的文档数量仍存在限制,这导致滑动窗口方法被广泛用于从排序列表中识别前k个最相关项。我们认为滑动窗口方法并不适用于列表重排序,因为其存在以下问题:(1)现有形式无法实现并行化;(2)在沿初始排序向上处理时,会反复对最优文档集进行重复计算;(3)采用自下而上的处理方式,优先对最低排序文档而非最高排序文档进行评分。基于这些缺陷以及一项初步研究(该研究表明列表排序器对其上下文窗口起始处的相关文档存在偏好),我们提出一种新颖算法:该算法将排序结果分区至深度k,并采用自上而下的方式处理文档。与滑动窗口方法不同,本算法通过引入枢轴元素实现固有并行化——该元素可同时与任意深度的文档进行比较。实验表明,在深度100的排序任务中,本算法将预期推理调用次数降低约33%,同时在多种强重排序器上保持了与现有方法相当的性能。