PROTEUS：基于拉格朗日强化学习的多LLM服务系统SLA感知路由 (PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems)

Production LLM deployments serve diverse workloads where cost and quality requirements vary by customer tier, time of day, and query criticality. Model serving systems accept latency SLOs directly. LLM routers do not. They force operators to tune parameters offline and guess what accuracy might result. The relationship between parameters and outcomes is indirect, non-monotonic, and dataset-dependent. Operators need to specify accuracy targets, not infer them from opaque settings. We present PROTEUS (Polymorphic Router for Operational Target Enforcement with Unified SLA), a router that accepts accuracy targets tau as runtime input. PROTEUS uses Lagrangian dual control. A learned dual variable lambda tracks constraint violations during training and conditions the policy network. This lets the router translate specified tau values into routing decisions that satisfy them. A single trained model serves the full accuracy spectrum without retraining.We evaluate on RouterBench (11 models, 405K queries) and SPROUT (14 models, 45K queries). PROTEUS achieves consistent floor compliance where accuracy meets or exceeds tau. The target-response correlation reaches 0.97 to 0.98. The closest baseline, OmniRouter, meets floors only 22% of the time despite also using Lagrangian optimization. PROTEUS operates across tau in [0.85, 0.95] from a single model. On RouterBench it achieves 90.1% accuracy, within 1.3% of oracle. On SPROUT it achieves 94.0% accuracy, within 4.6% of oracle. Cost savings reach 89.8% versus the best fixed model.

翻译：生产级大语言模型（LLM）部署需处理多样化工作负载，其成本与质量要求随客户层级、时段及查询关键性而变化。模型服务系统可直接接受延迟服务等级目标（SLO），而现有LLM路由系统则无法实现。它们迫使运维人员离线调整参数并猜测可能达到的准确率。参数与结果间的关系具有间接性、非单调性且依赖于数据集。运维人员需要指定准确率目标，而非从隐晦的设置中推断目标。本文提出PROTEUS（面向统一SLA操作目标执行的多态路由器），该系统可将准确率目标τ作为运行时输入。PROTEUS采用拉格朗日对偶控制方法：通过学习的对偶变量λ在训练过程中追踪约束违反情况，并以此调节策略网络。这使得路由器能够将指定的τ值转化为满足该目标的路由决策。单个训练完成的模型即可覆盖全准确率谱系需求，无需重新训练。我们在RouterBench（11个模型，40.5万次查询）和SPROUT（14个模型，4.5万次查询）数据集上进行评估。PROTEUS实现了稳定的底线合规性，其准确率始终达到或超过τ值，目标-响应相关性达到0.97至0.98。最接近的基线方法OmniRouter虽同样采用拉格朗日优化，但仅能在22%的情况下满足底线要求。PROTEUS通过单一模型即可在τ∈[0.85, 0.95]区间内运行：在RouterBench上达到90.1%的准确率（与理论最优值差距1.3%），在SPROUT上达到94.0%的准确率（与理论最优值差距4.6%）。相较于最佳固定模型，其成本节约最高可达89.8%。