Cloud-based Artificial Intelligence (AI) inference is increasingly latency- and context-sensitive, yet today's AI-as-a-Service is typically consumed as an application-chosen endpoint, leaving the network to provide only best-effort transport. This decoupling prevents enforceable tail-latency guarantees, compute-aware admission control, and continuity under mobility. This paper proposes Network-Exposed AI-as-a-Service (NE-AIaaS) built around a new service primitive: the AI Session (AIS)-a contractual object that binds model identity, execution placement, transport Quality-of-Service (QoS), and consent/charging scope into a single lifecycle with explicit failure semantics. We introduce the AI Service Profile (ASP), a compact contract that expresses task modality and measurable service objectives (e.g., time-to-first-response/token, p99 latency, success probability) alongside privacy and mobility constraints. On this basis, we specify protocol-grade procedures for (i) DISCOVER (model/site discovery), (ii) AI PAGING (context-aware selection of execution anchor), (iii) two-phase PREPARE/COMMIT that atomically co-reserves compute and QoS resources, and (iv) make-before-break MIGRATION for session continuity. The design is standard-mappable to Common API Framework (CAPIF) style northbound exposure, ETSI Multi-access Edge Computing (MEC) execution substrates, 5G QoS flows for transport enforcement, and Network Data Analytics Function (NWDAF) style analytics for closed-loop paging/migration triggers.
翻译:基于云的人工智能推理对延迟和上下文环境日益敏感,然而当前的AI即服务通常仅作为应用选定的端点被消费,网络仅提供尽力而为的传输服务。这种解耦阻碍了可强制执行的尾部延迟保障、算力感知的准入控制以及移动性下的连续性。本文提出围绕新型服务原语构建的面向网络开放的AI即服务:AI会话——一种将模型身份、执行位置、传输服务质量以及许可/计费范围绑定至具有明确故障语义的单一生命周期内的契约对象。我们引入AI服务配置文件,这是一种紧凑的契约,用于表达任务模态与可度量的服务目标(例如首次响应/令牌时间、p99延迟、成功概率)以及隐私和移动性约束。在此基础上,我们规范了协议级流程,包括:(i) 发现(模型/站点发现),(ii) AI寻呼(基于上下文选择执行锚点),(iii) 两阶段准备/提交以原子化地协同预留计算与QoS资源,以及(iv) 先建后断的迁移以实现会话连续性。该设计可标准化映射至通用API框架风格的北向开放接口、ETSI多接入边缘计算执行基座、用于传输强制的5G QoS流,以及网络数据分析功能风格的闭环寻呼/迁移触发分析。