Think When Needed: Model-Aware Reasoning Routing for LLM-based Ranking

Large language models (LLMs) are increasingly applied to ranking tasks in retrieval and recommendation. Although reasoning prompting can enhance ranking utility, our preliminary exploration reveals that its benefits are inconsistent and come at a substantial computational cost, suggesting that when to reason is as crucial as how to reason. To address this issue, we propose a reasoning routing framework that employs a lightweight, plug-and-play router head to decide whether to use direct inference (Non-Think) or reasoning (Think) for each instance before generation. The router head relies solely on pre-generation signals: i) compact ranking-aware features (e.g., candidate dispersion) and ii) model-aware difficulty signals derived from a diagnostic checklist reflecting the model's estimated need for reasoning. By leveraging these features before generation, the router outputs a controllable token that determines whether to apply the Think mode. Furthermore, the router can adaptively select its operating policy along the validation Pareto frontier during deployment, enabling dynamic allocation of computational resources toward instances most likely to benefit from Think under varying system constraints. Experiments on three public ranking datasets with different scales of open-source LLMs show consistent improvements in ranking utility with reduced token consumption (e.g., +6.3\% NDCG@10 with -49.5\% tokens on MovieLens with Qwen3-4B), demonstrating reasoning routing as a practical solution to the accuracy-efficiency trade-off.

翻译：大语言模型在检索与推荐领域的排序任务中应用日益广泛。尽管推理提示能够提升排序效用，但我们的初步探索发现其收益并不稳定，且伴随显著的计算开销，这表明“何时推理”与“如何推理”同等重要。为解决此问题，我们提出一种推理路由框架，该框架采用轻量级、即插即用的路由头，在生成前为每个实例决策是采用直接推理模式还是推理模式。路由头仅依赖于生成前的信号：i) 紧凑的排序感知特征，以及 ii) 源自诊断清单的模型感知难度信号，该清单反映了模型对推理的预估需求。通过利用这些生成前特征，路由头输出一个可控标记，以决定是否启用推理模式。此外，在部署过程中，路由头可沿验证帕累托前沿自适应选择其运行策略，从而在不同系统约束下，将计算资源动态分配给最可能从推理模式中受益的实例。在三个公开排序数据集上使用不同规模的开源大语言模型进行的实验表明，该方法在降低标记消耗的同时持续提升了排序效用，证明了推理路由是解决精度-效率权衡的一种实用方案。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

大语言模型的智能体化推理

专知会员服务

35+阅读 · 1月21日

从感知到推理：深度思考赋能多模态大语言模型

专知会员服务

25+阅读 · 2025年11月19日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日