As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing: selecting the right model for each query at inference time, has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The architecture follows two complementary Shannon-inspired views. In the information-theoretic regime, signal extraction reduces the entropy of "which model?" by distilling routing-relevant information from raw queries. In the Boolean-algebraic regime, the decision engine composes functionally complete routing policies from signal conditions. The central innovation is composable signal orchestration: thirteen heterogeneous signal types, spanning sub-millisecond heuristics and neural classifiers for semantics, safety, and modality, are composed through configurable Boolean decision rules into deployment-specific routing policies, so that fundamentally different scenarios (multi-cloud enterprise, privacy-regulated, cost-optimized) are expressed as different configurations over the same architecture. Matched decisions drive semantic model routing via thirteen selection algorithms, while per-decision plugin chains enforce safety constraints including a three-stage HaluGate hallucination detection pipeline and a lightweight episodic memory system with ReflectionGate for personalized multi-turn context. A typed neural-symbolic DSL specifies these routing policies and compiles them to multiple deployment targets, enabling configuration-first adaptation without code changes. Together, these components show that composable signal orchestration enables a single framework to serve diverse deployment scenarios with differentiated cost, privacy, and safety policies.
翻译:随着大语言模型(LLMs)在模态、能力和成本分布上的多样化,推理时的智能请求路由问题——即为每个查询选择合适模型——已成为关键系统挑战。我们提出vLLM语义路由器,一种面向多模态模型(MoM)部署的信号驱动决策路由框架。该架构遵循两条香农启发的互补视角:在信息论领域,信号提取通过从原始查询中蒸馏路由相关信息来降低“选择哪个模型?”的熵值;在布尔代数领域,决策引擎根据信号条件组合功能完备的路由策略。核心创新在于可组合信号编排:涵盖亚毫秒级启发式算法及语义、安全性和模态神经分类器的十三种异构信号类型,通过可配置布尔决策规则组合为部署特定路由策略,使得根本不同的场景(多云企业、隐私合规、成本优化)能表示为同一架构上的不同配置。匹配决策通过十三种选择算法驱动语义模型路由,同时每决策插件链强制执行安全约束,包括三级HaluGate幻觉检测管道及配备ReflectionGate的轻量级情节记忆系统用于个性化多轮上下文。类型化神经符号领域特定语言(DSL)指定这些路由策略并将其编译为多种部署目标,实现无需代码变更的配置优先适配。这些组件共同表明,可组合信号编排能使单一框架以差异化的成本、隐私和安全策略服务于多样化的部署场景。