Lookahead Routing for Large Language Models

Large language model (LLM) routers improve the efficiency of multi-model systems by directing each query to the most appropriate model while leveraging the diverse strengths of heterogeneous LLMs. Most existing approaches frame routing as a classification problem based solely on the input query. While this reduces overhead by avoiding inference across all models, it overlooks valuable information that could be gleaned from potential outputs and fails to capture implicit intent or contextual nuances that often emerge only during response generation. These limitations can result in suboptimal routing decisions, particularly for complex or ambiguous queries that require deeper semantic understanding. To address this challenge, we propose Lookahead, a routing framework that "foresees" potential model outputs by predicting their latent representations and uses these predictions to guide model selection, thus enabling more informed routing without full inference. Within this framework, we implement two approaches based on causal and masked language models. Empirical evaluations across seven public benchmarks - spanning instruction following, mathematical reasoning, and code generation - show that Lookahead consistently outperforms existing routing baselines, achieving an average performance gain of 7.7% over the state-of-the-art. Our code is available at https://github.com/huangcb01/lookahead-routing.

翻译：大语言模型（LLM）路由器通过将每个查询定向至最合适的模型，同时利用异构大语言模型的多样化优势，提升了多模型系统的运行效率。现有方法大多将路由问题视为仅基于输入查询的分类任务。虽然这种设计通过避免在所有模型上进行推理而降低了开销，但它忽略了从潜在输出中可获取的有价值信息，且无法捕捉通常在响应生成过程中才显现的隐含意图或上下文细微差异。这些局限性可能导致次优的路由决策，尤其对于需要深层语义理解的复杂或模糊查询。为应对这一挑战，我们提出Lookahead路由框架，该框架通过预测潜在输出的隐式表示来“预见”模型可能的生成结果，并利用这些预测指导模型选择，从而在不进行完整推理的情况下实现更精准的路由调度。在此框架内，我们基于因果语言模型与掩码语言模型实现了两种具体方法。在涵盖指令遵循、数学推理和代码生成等任务的七个公开基准测试上的实验表明，Lookahead路由框架持续优于现有路由基线，相较最先进方法平均性能提升达7.7%。相关代码已开源：https://github.com/huangcb01/lookahead-routing。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日