Efficient Long-Document Reranking via Block-Level Embeddings and Top-k Interaction Refinement

Dense encoders and LLM-based rerankers struggle with long documents: single-vector representations dilute fine-grained relevance, while cross-encoders are often too expensive for practical reranking. We present an efficient long-document reranking framework based on block-level embeddings. Each document is segmented into short blocks and encoded into block embeddings that can be precomputed offline. Given a query, we encode it once and score each candidate document by aggregating top-k query-block similarities with a simple weighted sum, yielding a strong and interpretable block-level relevance signal. To capture dependencies among the selected blocks and suppress redundancy, we introduce Top-k Interaction Refinement (TIR), a lightweight setwise module that applies query-conditioned attention over the top-k blocks and produces a bounded residual correction to block scores. TIR introduces only a small number of parameters and operates on top-k blocks, keeping query-time overhead low. Experiments on long-document reranking benchmarks (TREC DL and MLDR-zh) show that block representations substantially improve over single-vector encoders, and TIR provides consistent additional gains over strong long-document reranking baselines while maintaining practical reranking latency. For example, on TREC DL 2023, NDCG at 10 improves from 0.395 to 0.451 with the same block budget k = 65, using at most 4095 tokens. The resulting model supports interpretability by exposing which blocks drive each document's score and how refinement redistributes their contributions.

翻译：稠密编码器和基于大语言模型的重排序器在处理长文档时面临挑战：单向量表示会稀释细粒度相关性，而交叉编码器在实际重排序中往往计算成本过高。本文提出一种基于块级嵌入的高效长文档重排序框架。每个文档被分割为短文本块并编码为可离线预计算的块嵌入。给定查询时，我们对其编码一次，并通过带权求和聚合查询与候选文档前k个最相关块之间的相似度来评分，从而产生强解释性的块级相关性信号。为捕捉选定块间的依赖关系并抑制冗余，我们提出Top-k交互精化模块——一种轻量级集合式模块，对前k个块施加查询条件注意力，并产生有界的块分数残差修正。该模块仅引入少量参数且仅在前k个块上操作，保持较低查询时延。在长文档重排序基准上的实验表明：块表示显著优于单向量编码器，且该精化模块在保持实用重排序延迟的同时，持续超越现有强基线方法。例如在TREC DL 2023数据集上，当块预算k=65且最多使用4095个标记时，NDCG@10从0.395提升至0.451。该模型通过显式展示驱动文档得分的核心文本块及其贡献度再分配机制，提供了良好的可解释性。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【AAAI2026】URaG：面向高效长文档理解的多模态大语言模型统一检索与生成框架

专知会员服务

15+阅读 · 2025年11月14日

LaCache：用于高效长上下文建模的大语言模型梯状KV缓存机制

专知会员服务

11+阅读 · 2025年7月23日

ICML 2024 | 大语言模型预训练新前沿：「最佳适配打包」重塑文档处理标准

专知会员服务

24+阅读 · 2024年5月19日

大模型如何处理长上下文？亚马逊等最新《大型语言模型中上下文长度扩展技术》综述

专知会员服务

47+阅读 · 2024年1月31日