Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

\emph{Integrated communication and computation} (IC$^2$) has emerged as a new paradigm for enabling efficient edge inference in sixth-generation (6G) networks. However, the design of IC$^2$ technologies is hindered by the lack of a tractable theoretical framework for characterizing \emph{end-to-end} (E2E) inference performance. The metric is highly complicated as it needs to account for both channel distortion and artificial intelligence (AI) model architecture and computational complexity. In this work, we address this challenge by developing a tractable analytical model for E2E inference accuracy and leveraging it to design a \emph{channel-adaptive AI} algorithm that maximizes inference throughput, referred to as the edge processing rate (EPR), under latency and accuracy constraints. Specifically, we consider an edge inference system in which a server deploys a backbone model with early exit, which enables flexible computational complexity, to perform inference on data features transmitted by a mobile device. The proposed accuracy model characterizes high-dimensional feature distributions in the angular domain using a Mixture of von Mises (MvM) distribution. This leads to a desired closed-form expression for inference accuracy as a function of quantization bit-width and model traversal depth, which represents channel distortion and computational complexity, respectively. Building upon this accuracy model, we formulate and solve the EPR maximization problem under joint latency and accuracy constraints, leading to a channel-adaptive AI algorithm that achieves full IC$^2$ integration. The proposed algorithm jointly adapts transmit-side feature compression and receive-side model complexity according to channel conditions to maximize overall efficiency and inference throughput. Experimental results demonstrate its superior performance as compared with fixed-complexity counterparts.

翻译：集成通信与计算（IC²）已成为第六代（6G）网络中实现高效边缘推理的新范式。然而，IC²技术的设计因缺乏一个易于处理的、用于表征端到端（E2E）推理性能的理论框架而受阻。该度量标准高度复杂，因为它需要同时考虑信道失真以及人工智能（AI）模型架构与计算复杂度。在本工作中，我们通过建立一个易于处理的E2E推理精度分析模型，并利用该模型设计一种在延迟和精度约束下最大化推理吞吐量（称为边缘处理速率，EPR）的信道自适应AI算法，来应对这一挑战。具体而言，我们考虑一个边缘推理系统，其中服务器部署一个带有提前退出机制（这允许灵活的计算复杂度）的主干模型，对移动设备传输的数据特征执行推理。所提出的精度模型使用冯·米塞斯混合分布来表征角域中的高维特征分布。这导出了一个期望的、关于量化比特宽度（代表信道失真）和模型遍历深度（代表计算复杂度）的推理精度闭式表达式。基于此精度模型，我们在联合延迟与精度约束下形式化并求解了EPR最大化问题，从而产生了一种实现完全IC²集成的信道自适应AI算法。所提出的算法根据信道条件，联合自适应地调整发送端的特征压缩和接收端的模型复杂度，以最大化整体效率和推理吞吐量。实验结果表明，与固定复杂度的方案相比，该算法具有优越的性能。