Large Language Models (LLMs) have recently been explored as fine-grained zero-shot re-rankers by leveraging attention signals to estimate document relevance. However, existing methods either aggregate attention signals across all heads or rely on a statically selected subset identified by heuristic rules. This solution can be suboptimal because the informative heads can vary across queries or domains. Moreover, naively combining multiple heads can degrade performance due to redundancy or conflicting ranking signals. In this paper, we propose a query-dependent head selection method, RouteHead, for attention-based re-ranking with LLMs. Specifically, we learn a lightweight router that can map each query to an optimal head set, and relevance scores are computed by aggregating attention signals only from these heads. Since query-to-head optimal labels are unavailable, we first construct pseudo labels via an offline search. The router represents each head with a learnable embedding and represents each query using an embedding extracted from the hidden states of the frozen LLM. Then it is trained on the pseudo labels with a sparsity regularizer. Experiments on diverse benchmarks and multiple LLM backbones show that the proposed method consistently outperforms strong baselines.
翻译:大语言模型(LLMs)近期通过利用注意力信号估计文档相关性,被探索作为细粒度零样本重排序器。然而现有方法要么聚合所有注意力头的信号,要么依赖预定义启发式规则静态选择子集。这种方案可能并非最优,因为信息头会随查询或领域动态变化。此外,多个头的简单组合可能因冗余或冲突排序信号导致性能下降。本文提出一种查询相关的头选择方法RouteHead,用于基于注意力机制的LLM重排序。具体而言,我们训练一个轻量级路由器,将每个查询映射到最优头集合,仅聚合这些头的注意力信号计算相关性分数。由于查询到头的理想标签不可得,我们首先通过离线搜索构建伪标签。路由器为每个头学习可训练嵌入,并利用冻结LLM隐含状态提取的查询嵌入表示查询,随后在带有稀疏性正则化的伪标签上训练。在多样基准测试和多个LLM骨干网络上的实验表明,所提方法持续优于强基线方法。