Cloud-based Artificial Intelligence (AI) inference is increasingly latency- and context-sensitive, yet today's AI-as-a-Service is typically consumed as an application-chosen endpoint, leaving the network to provide only best-effort transport. This decoupling prevents enforceable tail-latency guarantees, compute-aware admission control, and continuity under mobility. This paper proposes Network-Exposed AI-as-a-Service (NE-AIaaS) built around a new service primitive: the AI Session (AIS)-a contractual object that binds model identity, execution placement, transport Quality-of-Service (QoS), and consent/charging scope into a single lifecycle with explicit failure semantics. We introduce the AI Service Profile (ASP), a compact contract that expresses task modality and measurable service objectives (e.g., time-to-first-response/token, p99 latency, success probability) alongside privacy and mobility constraints. On this basis, we specify protocol-grade procedures for (i) DISCOVER (model/site discovery), (ii) AI PAGING (context-aware selection of execution anchor), (iii) two-phase PREPARE/COMMIT that atomically co-reserves compute and QoS resources, and (iv) make-before-break MIGRATION for session continuity. The design is standard-mappable to Common API Framework (CAPIF) style northbound exposure, ETSI Multi-access Edge Computing (MEC) execution substrates, 5G QoS flows for transport enforcement, and Network Data Analytics Function (NWDAF) style analytics for closed-loop paging/migration triggers.
翻译:基于云的人工智能推理对延迟和上下文日益敏感,然而当前的AI即服务通常仅作为应用选定的端点被消费,网络仅提供尽力而为的传输服务。这种解耦阻碍了可强制执行的尾部延迟保证、计算感知的准入控制以及移动性下的连续性。本文提出围绕一种新型服务原语构建的面向网络开放的AI即服务:AI会话——一种将模型身份、执行放置、传输服务质量以及许可/计费范围绑定到具有明确故障语义的单一生命周期中的契约对象。我们引入AI服务配置文件,这是一种紧凑的契约,用于表达任务模态和可度量的服务目标(例如,首次响应/令牌时间、p99延迟、成功概率)以及隐私和移动性约束。在此基础上,我们规定了协议级的过程,包括:(i)DISCOVER(模型/站点发现),(ii)AI PAGING(上下文感知的执行锚点选择),(iii)用于原子化协同预留计算和QoS资源的两阶段PREPARE/COMMIT,以及(iv)用于会话连续性的先建后断式MIGRATION。该设计可标准映射到通用API框架风格的北向开放接口、ETSI多接入边缘计算执行基座、用于传输强制的5G QoS流,以及用于闭环分页/迁移触发的网络数据分析功能风格的分析系统。