Decoder-only LLM rerankers struggle with long documents: inference is costly and relevance signals can be diluted by irrelevant context. Motivated by an attention analysis indicating a consistent degradation trend when non-relevant text is appended, we propose EviRerank, an evidence-based long-document reranking framework for decoder-only LLMs. EviRerank (i) scores document blocks with a lightweight selector (BM25, bi-encoder, or cross-encoder), (ii) constructs a compact reranking context under a hard token cap by dynamically budgeting evidence blocks with Adaptive Evidence Budgeting (AEB) and adding a global summary cue via Summary Augmentation (SA), and (iii) reranks with a decoder-only LLM. Across TREC DL'19, DL'23, and MLDR-zh, EviRerank consistently outperforms full-document LLM reranking and strong block-selection baselines while substantially reducing the required input length. On TREC DL'19, EviRerank achieves 0.743 nDCG@10 and 0.307 MAP, establishing a new best result and improving over RankLLaMA (0.701/0.288) by +0.042 nDCG@10 (+6.0%) and +0.019 MAP (+6.6%).
翻译:仅解码器架构的大型语言模型(LLM)重排序器在处理长文档时面临挑战:推理成本高昂,且相关性信号易被无关上下文稀释。受注意力机制分析(表明附加非相关文本会导致性能持续下降趋势)的启发,我们提出了EviRerank,一个面向仅解码器LLM的、基于证据的长文档重排序框架。EviRerank(i)使用轻量级选择器(BM25、双编码器或交叉编码器)对文档块进行评分,(ii)通过自适应证据预算(AEB)动态分配证据块,并借助摘要增强(SA)添加全局摘要提示,在严格的令牌上限内构建紧凑的重排序上下文,(iii)使用仅解码器LLM进行重排序。在TREC DL'19、DL'23和MLDR-zh数据集上的实验表明,EviRerank在显著减少所需输入长度的同时,持续优于全文LLM重排序方法及强力的块选择基线模型。在TREC DL'19上,EviRerank取得了0.743的nDCG@10和0.307的MAP,创造了新的最佳结果,较RankLLaMA(0.701/0.288)分别提升了+0.042 nDCG@10(+6.0%)和+0.019 MAP(+6.6%)。