One of the key missions of sixth-generation (6G) mobile networks is to deploy large-scale artificial intelligence (AI) models at the network edge to provide remote-inference services for edge devices. The resultant platform, known as edge inference, will support a wide range of Internet-of-Things applications, such as autonomous driving, industrial automation, and augmented reality. Given the mission-critical and time-sensitive nature of these tasks, it is essential to design edge inference systems that are both reliable and capable of meeting stringent end-to-end (E2E) latency constraints. Existing studies, which primarily focus on communication reliability as characterized by channel outage probability, may fail to guarantee E2E performance, specifically in terms of E2E inference accuracy and latency. To address this limitation, we propose a theoretical framework that introduces and mathematically characterizes the inference outage (InfOut) probability, which quantifies the likelihood that the E2E inference accuracy falls below a target threshold. Under an E2E latency constraint, this framework establishes a fundamental tradeoff between communication overhead (i.e., uploading more sensor observations) and inference reliability as quantified by the InfOut probability. To find a tractable way to optimize this tradeoff, we derive accurate surrogate functions for InfOut probability by applying a Gaussian approximation to the distribution of the received discriminant gain. Experimental results demonstrate the superiority of the proposed design over conventional communication-centric approaches in terms of E2E inference reliability.
翻译:第六代(6G)移动网络的关键任务之一是在网络边缘部署大规模人工智能(AI)模型,为边缘设备提供远程推理服务。由此产生的平台被称为边缘推理,它将支持广泛的物联网应用,如自动驾驶、工业自动化和增强现实。鉴于这些任务的关键性及对时间敏感的特性,设计既可靠又能满足严格端到端(E2E)延迟约束的边缘推理系统至关重要。现有研究主要关注由信道中断概率表征的通信可靠性,可能无法保证E2E性能,特别是在E2E推理精度和延迟方面。为解决这一局限性,我们提出一个理论框架,该框架引入并数学表征了推理中断(InfOut)概率,它量化了E2E推理精度低于目标阈值的可能性。在E2E延迟约束下,该框架建立了通信开销(即上传更多传感器观测值)与由InfOut概率量化的推理可靠性之间的基本权衡。为寻找优化这种权衡的可处理方法,我们通过对接收判别增益的分布应用高斯近似,推导了InfOut概率的精确替代函数。实验结果表明,所提出设计在E2E推理可靠性方面优于传统的以通信为中心的方法。