The rapid development of large language models (LLMs) like Llama has significantly advanced information retrieval (IR) systems. However, using LLMs for long documents, as in RankLLaMA, remains challenging due to computational complexity, especially concerning input token length. Furthermore, the internal mechanisms of LLMs during ranking are still not fully understood. In this paper, we first explore the internal workings of LLMs during relevance judgement and identify that specific attention heads play a crucial role in aligning relevant tokens. This observation inspires us to revisit the block pre-ranking strategy used in KeyB, which remains state-of-the-art (SOTA) on the TREC 2019 DL document ranking dataset. Building on these insights, we develop KeyB2, an advanced long document IR approach that integrates block pre-ranking with the performance of LLMs. KeyB2 efficiently identifies and processes the most relevant blocks, reducing computational costs and improving ranking effectiveness. Additionally, we introduce a new bi-encoder block matching strategy for KeyB2. Comprehensive experiments on long-document datasets, including TREC 2019 DL, Robust04, and MLDR-zh, show that KeyB2 outperforms baselines like RankLLaMA and KeyB by reducing reranking time and GPU memory usage while enhancing retrieval performance, achieving new SOTA results on TREC 2019 DL with higher NDCG@10 and MAP scores.
翻译:Llama等大语言模型(LLM)的快速发展显著推动了信息检索(IR)系统的进步。然而,如RankLLaMA所示,将LLM应用于长文档排序仍面临计算复杂性的挑战,尤其是在输入令牌长度方面。此外,LLM在排序过程中的内部机制尚未被完全理解。本文首先探究了LLM在进行相关性判定时的内部工作机制,发现特定的注意力头在相关令牌对齐中起着关键作用。这一观察启发我们重新审视KeyB中使用的块预排序策略——该策略在TREC 2019 DL文档排序数据集上仍保持最先进(SOTA)水平。基于这些洞见,我们开发了KeyB2,一种将块预排序与LLM性能相结合的高级长文档IR方法。KeyB2能高效识别并处理最相关的文本块,从而降低计算成本并提升排序效果。此外,我们为KeyB2引入了一种新的双编码器块匹配策略。在包括TREC 2019 DL、Robust04和MLDR-zh在内的长文档数据集上的综合实验表明,KeyB2在减少重排序时间和GPU内存占用的同时,提升了检索性能,其NDCG@10和MAP分数均优于RankLLaMA和KeyB等基线模型,在TREC 2019 DL数据集上取得了新的SOTA结果。