Decoder-only LLM rerankers are powerful but often struggle with long documents: inference is costly and relevance signals can be diluted as irrelevant text accumulates in the context window. Motivated by an attention analysis showing that relevance-aligned heads degrade when non-relevant text is appended, we propose EviRerank, a scalable framework that (i) scores document blocks with a lightweight selector (BM25, bi-encoder, or cross-encoder), (ii) constructs a compact evidence context under a strict token budget, and (iii) reranks with a decoder-only LLM. Our key contribution is Adaptive Evidence Budgeting (AEB), an information-density-aware dynamic stopping strategy that avoids low-utility tail blocks, and we further study Summary Augmentation (SA) within the same budget. Across TREC DL'19, DL'23, and MLDR-zh, EviRerank consistently improves over full-document LLM reranking and strong block-selection baselines while substantially reducing the required input length. On TREC DL'19, EviRerank achieves 0.743 nDCG@10 and 0.307 MAP, improving over RankLLaMA (0.701/0.288) by +0.042 nDCG@10 (+6.0%) and +0.019 MAP (+6.6%).
翻译:仅解码器的大型语言模型重排器性能强大,但处理长文档时常常面临挑战:推理成本高昂,且随着无关文本在上下文窗口中累积,相关性信号可能被稀释。受注意力分析启发(该分析表明,当附加非相关文本时,与相关性对齐的注意力头性能会下降),我们提出了EviRerank,一个可扩展的框架。该框架(i)使用轻量级选择器(BM25、双编码器或交叉编码器)对文档块进行评分,(ii)在严格的令牌预算下构建紧凑的证据上下文,以及(iii)使用仅解码器的大型语言模型进行重排。我们的核心贡献是自适应证据预算方法,这是一种基于信息密度的动态停止策略,能够避免低效用的尾部文档块;我们还在相同预算内进一步研究了摘要增强方法。在TREC DL'19、DL'23和MLDR-zh数据集上的实验表明,EviRerank相较于全文档大型语言模型重排方法以及强大的块选择基线方法,性能持续提升,同时显著减少了所需的输入长度。在TREC DL'19上,EviRerank实现了0.743的nDCG@10和0.307的MAP,相较于RankLLaMA(0.701/0.288),nDCG@10提升了0.042(+6.0%),MAP提升了0.019(+6.6%)。