PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems

Production LLM deployments serve diverse workloads where cost and quality requirements vary by customer tier, time of day, and query criticality. Model serving systems accept latency SLOs directly. LLM routers do not. They force operators to tune parameters offline and guess what accuracy might result. The relationship between parameters and outcomes is indirect, non-monotonic, and dataset-dependent. Operators need to specify accuracy targets, not infer them from opaque settings. We present PROTEUS (Polymorphic Router for Operational Target Enforcement with Unified SLA), a router that accepts accuracy targets tau as runtime input. PROTEUS uses Lagrangian dual control. A learned dual variable lambda tracks constraint violations during training and conditions the policy network. This lets the router translate specified tau values into routing decisions that satisfy them. A single trained model serves the full accuracy spectrum without retraining.We evaluate on RouterBench (11 models, 405K queries) and SPROUT (14 models, 45K queries). PROTEUS achieves consistent floor compliance where accuracy meets or exceeds tau. The target-response correlation reaches 0.97 to 0.98. The closest baseline, OmniRouter, meets floors only 22% of the time despite also using Lagrangian optimization. PROTEUS operates across tau in [0.85, 0.95] from a single model. On RouterBench it achieves 90.1% accuracy, within 1.3% of oracle. On SPROUT it achieves 94.0% accuracy, within 4.6% of oracle. Cost savings reach 89.8% versus the best fixed model.

翻译：生产级大语言模型部署需应对多样化工作负载，其成本与质量要求随客户层级、时段及查询关键性动态变化。现有模型服务系统可直接接受延迟服务水平目标约束，而LLM路由系统则缺乏此能力。这迫使运维人员离线调整参数并推测可能达成的准确率，但参数与结果间存在间接、非单调且依赖数据集的复杂关系。运维人员应直接设定准确率目标，而非从隐晦参数配置中推断结果。本文提出PROTEUS（面向统一SLA操作目标执行的多态路由器），该系统可将准确率目标τ作为运行时输入。PROTEUS采用拉格朗日对偶控制机制：通过训练过程中学习的对偶变量λ追踪约束违反情况，并以此调节策略网络，使路由器能将指定的τ值转化为满足该目标的路由决策。单一训练模型即可覆盖全精度谱系需求而无需重新训练。我们在RouterBench（11个模型，40.5万次查询）和SPROUT（14个模型，4.5万次查询）数据集上进行评估。PROTEUS在准确率达成或超越τ值时始终保持底线合规性，目标-响应相关性达到0.97至0.98。最接近的基线方法OmniRouter虽同样采用拉格朗日优化，但仅能在22%的情况下满足底线要求。PROTEUS通过单一模型即可在τ∈[0.85, 0.95]区间内稳定运行：在RouterBench上实现90.1%准确率（与理论最优值差距1.3%），在SPROUT上实现94.0%准确率（与理论最优值差距4.6%）。相较于最佳固定模型方案，最高可节省89.8%的成本。