Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders

The success of Large Language Models (LLMs) has motivated a shift toward generative approaches to retrieval and ranking, aiming to supersede classical Dual Encoders (DEs) and Cross Encoders (CEs). A prominent paradigm is pointwise Autoregressive Ranking (ARR), where an LLM generates document identifiers (docIDs) token-by-token to enable ranking via beam search. ARR offers the promise of superior expressivity compared to DEs while avoiding the prohibitive computational cost of CEs. However, a formal theoretical foundation for this expressive power has been missing. Moreover, the standard next-token prediction loss is rank-agnostic and inappropriate for finetuning an LLM for ranking tasks. In this paper, we first prove that the expressive capacity of ARR is strictly superior to DEs. While a DE requires an embedding dimension that grows linearly with corpus size to achieve arbitrary rankings, ARR can solve it with a constant hidden dimension. We then propose SToICaL (Simple Token-Item Calibrated Loss), a generalized rank-aware training loss for LLM finetuning. By using item-level reweighting and prefix-tree marginalization, we distribute probability mass over valid docID tokens based on their ground-truth relevance. Experiments on WordNet and ESCI datasets verify that our loss suppresses invalid docID generations and significantly improves ranking metrics beyond top-1 retrieval.

翻译：大型语言模型（LLM）的成功推动了检索与排序方法向生成式范式的转变，旨在取代经典的双编码器（DE）和交叉编码器（CE）。一个突出的范式是点式自回归排序（ARR），其中LLM通过逐词生成文档标识符（docID），并借助束搜索实现排序。与DE相比，ARR有望提供更强的表达能力，同时避免CE高昂的计算成本。然而，关于这种表达能力的严格理论基础一直缺失。此外，标准的下一个词预测损失与排序任务无关，不适合用于微调LLM进行排序。本文首先证明了ARR的表达能力严格优于DE：DE需要嵌入维度随语料库大小线性增长才能实现任意排序，而ARR仅需恒定隐藏维度即可解决该问题。随后，我们提出了SToICaL（简单词项校准损失），一种用于LLM微调的广义排序感知训练损失。通过使用项级重加权与前缀树边缘化，我们根据真实相关性在有效的docID词符上分配概率质量。在WordNet和ESCI数据集上的实验验证了我们的损失能有效抑制无效docID的生成，并显著提升了排序指标，超越了仅关注Top-1检索的性能。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

稀疏自编码器综述：解释大语言模型的内部机制

专知会员服务

17+阅读 · 2025年12月27日

【EMNLP2025】ReCode：基于细粒度检索增强生成的LLM代码修复方法

专知会员服务

10+阅读 · 2025年9月3日

大规模语言模型增强推荐系统：分类、趋势、应用与未来

专知会员服务

40+阅读 · 2024年12月22日

人工智能驱动的自动程序修复与代码生成的技术与进展全面综述

专知会员服务

25+阅读 · 2024年11月15日