Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow performance gaps, existing approaches face significant challenges: manually defined task taxonomies cannot capture fine-grained capability distinctions, while monolithic routers struggle to differentiate subtle differences across diverse tasks. We propose a two-stage routing architecture that addresses these limitations through automated fine-grained task discovery and task-aware quality estimation. Our first stage employs graph-based clustering to discover latent task types and trains a classifier to assign prompts to discovered tasks. The second stage uses a mixture-of-experts architecture with task-specific prediction heads for specialized quality estimates. At inference, we aggregate predictions from both stages to balance task-level stability with prompt-specific adaptability. Evaluated on 10 benchmarks with 11 frontier models, our method consistently outperforms existing baselines and surpasses the strongest individual model while incurring less than half its cost.
翻译:提示路由能够动态地从候选模型池中为每个查询选择最合适的大型语言模型,在优化性能的同时管理成本。随着模型池扩展到包含数十个性能差距狭窄的前沿模型,现有方法面临重大挑战:手动定义的任务分类无法捕捉细粒度的能力差异,而单一的路由器难以区分不同任务间的细微差别。我们提出了一种两阶段路由架构,通过自动化的细粒度任务发现和任务感知的质量估计来解决这些局限性。第一阶段采用基于图的聚类来发现潜在任务类型,并训练一个分类器将提示分配到发现的任务中。第二阶段采用混合专家架构,配备任务特定的预测头以进行专门化的质量估计。在推理阶段,我们聚合两个阶段的预测,以平衡任务级别的稳定性和提示特定的适应性。在包含11个前沿模型的10个基准测试上的评估表明,我们的方法一致优于现有基线,并超越了最强的单个模型,同时成本不到其一半。