High-Fidelity Network Management for Federated AI-as-a-Service: Cross-Domain Orchestration

To support the emergence of AI-as-a-Service (AIaaS), communication service providers (CSPs) are on the verge of a radical transformation-from pure connectivity providers to AIaaS a managed network service (control-and-orchestration plane that exposes AI models). In this model, the CSP is responsible not only for transport/communications, but also for intent-to-model resolution and joint network-compute orchestration, i.e., reliable and timely end-to-end delivery. The resulting end-to-end AIaaS service thus becomes governed by communications impairments (delay, loss) and inference impairments (latency, error). A central open problem is an operational AIaaS control-and-orchestration framework that enforces high fidelity, particularly under multi-domain federation. This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models. Using stochastic network calculus, we derive bounds on end-to-end delay violation probabilities across tandem domains and obtain an optimization-ready risk-budget decomposition. We show that tenant-level reservations prevent bursty traffic from inflating tail latency under TRE contracts. An auditing layer then uses runtime telemetry to estimate extreme-percentile performance, quantify uncertainty, and attribute tail-risk to each domain for accountability. Packet-level Monte-Carlo simulations demonstrate improved p99.9 compliance under overload via admission control and robust tenant isolation under correlated burstiness.

翻译：为支持人工智能即服务（AIaaS）的兴起，通信服务提供商（CSP）正面临从纯连接提供商向AIaaS托管网络服务（即暴露AI模型的控制与编排平面）的根本性转型。在此模式下，CSP不仅需负责传输/通信，还需承担意图到模型的解析及网络-计算联合编排，即可靠且及时的端到端交付。由此产生的端到端AIaaS服务将同时受通信损伤（时延、丢包）与推理损伤（延迟、错误）的制约。当前的核心开放性问题在于构建可实施高保真保障的AIaaS控制与编排操作框架，尤其在多域联邦场景下。本文提出一种基于尾部风险包络（TRE）的保障导向型AIaaS管理平面：TRE作为可组合的签名化单域描述符，将确定性防护栏与随机化速率-时延-损伤模型相结合。通过随机网络演算，我们推导出串联域间端到端时延违反概率的边界，并获得适用于优化问题的风险预算分解方案。研究表明，在TRE合约框架下，租户级资源预留能有效抑制突发流量导致的尾部时延膨胀。审计层随后利用运行时遥测数据估计极端百分位性能、量化不确定性，并将尾部风险归因至各域以实现责任追溯。基于数据包级的蒙特卡洛仿真表明，通过准入控制机制可在过载场景下提升p99.9时延合规性，并在相关突发流量下实现稳健的租户隔离。