Cooperative inference in Mobile Edge Computing (MEC), achieved by deploying partitioned Deep Neural Network (DNN) models between resource-constrained user equipments (UEs) and edge servers (ESs), has emerged as a promising paradigm. Firstly, we consider scenarios of continuous Artificial Intelligence (AI) task arrivals, like the object detection for video streams, and utilize a serial queuing model for the accurate evaluation of End-to-End (E2E) delay in cooperative edge inference. Secondly, to enhance the long-term performance of inference systems, we formulate a multi-slot stochastic E2E delay optimization problem that jointly considers model partitioning and multi-dimensional resource allocation. Finally, to solve this problem, we introduce a Lyapunov-guided Multi-Dimensional Optimization algorithm (LyMDO) that decouples the original problem into per-slot deterministic problems, where Deep Reinforcement Learning (DRL) and convex optimization are used for joint optimization of partitioning decisions and complementary resource allocation. Simulation results show that our approach effectively improves E2E delay while balancing long-term resource constraints.
翻译:移动边缘计算(MEC)中通过将深度神经网络(DNN)模型在资源受限的用户设备(UE)和边缘服务器(ES)之间进行分割部署所实现的协作推理,已成为一种有前景的范式。首先,我们考虑连续人工智能(AI)任务到达的场景(如视频流中的目标检测),并利用串行排队模型精确评估协作边缘推理中的端到端(E2E)时延。其次,为提升推理系统的长期性能,我们构建了一个联合考虑模型分割与多维资源分配的多时隙随机E2E时延优化问题。最后,为求解该问题,我们提出一种Lyapunov引导的多维优化算法(LyMDO),该算法将原问题解耦为每时隙确定性问题,其中采用深度强化学习(DRL)与凸优化方法对分割决策与互补资源分配进行联合优化。仿真结果表明,我们的方法在平衡长期资源约束的同时有效改善了E2E时延。