Dense encoders and LLM-based rerankers struggle with long documents: single-vector representations dilute fine-grained relevance, while cross-encoders are often too expensive for practical reranking. We present an efficient long-document reranking framework based on block-level embeddings. Each document is segmented into short blocks and encoded into block embeddings that can be precomputed offline. Given a query, we encode it once and score each candidate document by aggregating top-k query-block similarities with a simple weighted sum, yielding a strong and interpretable block-level relevance signal. To capture dependencies among the selected blocks and suppress redundancy, we introduce Top-k Interaction Refinement (TIR), a lightweight setwise module that applies query-conditioned attention over the top-k blocks and produces a bounded residual correction to block scores. TIR introduces only a small number of parameters and operates on top-k blocks, keeping query-time overhead low. Experiments on long-document reranking benchmarks (TREC DL and MLDR-zh) show that block representations substantially improve over single-vector encoders, and TIR provides consistent additional gains over strong long-document reranking baselines while maintaining practical reranking latency. For example, on TREC DL 2023, NDCG at 10 improves from 0.395 to 0.451 with the same block budget k = 65, using at most 4095 tokens. The resulting model supports interpretability by exposing which blocks drive each document's score and how refinement redistributes their contributions.
翻译:稠密编码器和基于大语言模型的重排序器在处理长文档时面临挑战:单向量表示会稀释细粒度相关性,而交叉编码器在实际重排序中往往计算成本过高。本文提出一种基于块级嵌入的高效长文档重排序框架。每个文档被分割为短文本块并编码为可离线预计算的块嵌入。给定查询时,我们对其编码一次,并通过带权求和聚合查询与候选文档前k个最相关块之间的相似度来评分,从而产生强解释性的块级相关性信号。为捕捉选定块间的依赖关系并抑制冗余,我们提出Top-k交互精化模块——一种轻量级集合式模块,对前k个块施加查询条件注意力,并产生有界的块分数残差修正。该模块仅引入少量参数且仅在前k个块上操作,保持较低查询时延。在长文档重排序基准上的实验表明:块表示显著优于单向量编码器,且该精化模块在保持实用重排序延迟的同时,持续超越现有强基线方法。例如在TREC DL 2023数据集上,当块预算k=65且最多使用4095个标记时,NDCG@10从0.395提升至0.451。该模型通过显式展示驱动文档得分的核心文本块及其贡献度再分配机制,提供了良好的可解释性。