Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention scores of selected heads. This approach provides a listwise solution that leverages holistic information within the entire candidate shortlist during ranking. At the same time, it naturally produces continuous relevance scores, enabling training on arbitrary retrieval datasets without requiring Likert-scale supervision. Our framework is lightweight and effective, requiring only small-scale models (e.g., 4B parameters) to achieve strong performance. Extensive experiments demonstrate that our method outperforms existing state-of-the-art pointwise and listwise rerankers across multiple domains, including Wikipedia and long narrative datasets. It further establishes a new state-of-the-art on the LoCoMo benchmark that assesses the capabilities of dialogue understanding and memory usage. We further demonstrate that our framework supports flexible extensions. For example, augmenting candidate passages with contextual information further improves ranking accuracy, while training attention heads from middle layers enhances efficiency without sacrificing performance.
翻译:基于对大型语言模型中检索头部的现有分析,我们提出了一种替代性重排序框架,该框架通过训练模型利用选定头部的注意力分数来估计段落-查询相关性。该方法提供了一种列表式解决方案,在排序过程中利用整个候选短名单内的整体信息。同时,它自然地生成连续的相关性分数,使得能够在任意检索数据集上进行训练,而无需利克特量表监督。我们的框架轻量且高效,仅需小规模模型(例如40亿参数)即可实现强劲性能。大量实验表明,我们的方法在包括维基百科和长篇叙事数据集在内的多个领域,均优于现有的最先进的点式和列表式重排序器。它进一步在评估对话理解与记忆使用能力的LoCoMo基准上确立了新的最先进水平。我们还证明,我们的框架支持灵活的扩展。例如,通过上下文信息增强候选段落可进一步提升排序准确性,而训练来自中间层的注意力头部则能在不牺牲性能的情况下提高效率。