Large language models (LLMs), with advanced linguistic capabilities, have been employed in reranking tasks through a sequence-to-sequence approach. In this paradigm, multiple passages are reranked in a listwise manner and a textual reranked permutation is generated. However, due to the limited context window of LLMs, this reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. This not only increases computational costs but also restricts the LLM from fully capturing all the comparison information for all candidates. To address these challenges, we propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking. To achieve it, we first propose the relevance-aware listwise reranking framework, which incorporates explicit list-view relevance scores to improve reranking efficiency and enable global comparison across the entire candidate set. Second, to ensure the comparability of the computed scores, we propose self-calibrated training that uses point-view relevance assessments generated internally by the LLM itself to calibrate the list-view relevance assessments. Extensive experiments and comprehensive analysis on the BEIR benchmark and TREC Deep Learning Tracks demonstrate the effectiveness and efficiency of our proposed method.
翻译:大型语言模型凭借其卓越的语言能力,已通过序列到序列的方式应用于重排序任务。在此范式下,多个文本段落以列表形式进行重排序,并生成文本化的重排序序列。然而,由于大型语言模型上下文窗口有限,该重排序范式需要采用滑动窗口策略以迭代处理更大的候选集。这不仅增加了计算成本,也限制了大型语言模型充分捕捉所有候选文本的全部比较信息。为应对这些挑战,我们提出了一种新颖的自校准列表式重排序方法,旨在利用大型语言模型生成用于排序的全局相关性分数。为实现这一目标,我们首先提出了相关性感知的列表式重排序框架,该框架引入显式的列表视角相关性分数,以提升重排序效率,并实现对整个候选集的全局比较。其次,为确保计算分数的可比性,我们提出了自校准训练方法,利用大型语言模型内部生成的点视角相关性评估来校准列表视角相关性评估。在BEIR基准和TREC深度学习赛道上的大量实验与综合分析证明了我们所提方法的有效性和高效性。