CANS: Accelerating Multiuser Collaborative Edge Inference via Cooperative Autodidactic NeuroSurgeon

Recently, mobile edge computing (MEC)-enabled collaborative deep neural network (DNN) inference has emerged as a promising approach for delivering intelligent services to resource-constrained mobile devices. A representative scenario is multi-user collaborative edge inference, where distinct devices independently partition their DNN models and offload backend computation to a common edge server over wireless networks. However, determining the optimal DNN partition for each device is challenging due to unknown and time-varying system conditions, including fluctuating wireless links and diverse device capabilities. To address this problem, we propose Cooperative Autodidactic NeuroSurgeon (CANS), a collaborative edge inference framework that enables devices to adaptively learn optimal DNN partitions by sharing informative feedback during online inference. To handle the challenge of device heterogeneity and better leverage offline inference experience, we integrate a novel FedLinUCB-DW algorithm that groups devices of the same type and warm-starts online exploration using local offline early-exit inference experience. Furthermore, we provide theoretical guarantees for FedLinUCB-DW by deriving the regret upper bound. We also validate our method on both a simulated environment and a hardware prototype system. Empirical evaluations demonstrate that CANS achieves lower inference latency compared to state-of-the-art baselines. Especially, in prototype experiments on two edge devices, the proposed CANS reduced average inference latency by up to 50% compared to the non-cooperative baseline.

翻译：近年来，移动边缘计算（MEC）赋能的协作式深度神经网络（DNN）推理已成为向资源受限移动设备提供智能服务的一种有前途的方法。一个代表性场景是多用户协作边缘推理，其中不同设备独立划分其DNN模型，并通过无线网络将后端计算卸载到共享边缘服务器。然而，由于未知且时变的系统条件（包括波动的无线链路和多样的设备能力），为每台设备确定最优DNN划分极具挑战性。为解决该问题，我们提出协同自教神经外科医生（CANS），一种协作边缘推理框架，使设备能够通过在线推理过程中共享信息反馈来自适应学习最优DNN划分。为应对设备异构性的挑战并更好地利用离线推理经验，我们集成了一种新型FedLinUCB-DW算法，该算法对同类型设备进行分组，并利用本地离线早期退出推理经验对在线探索进行热启动。此外，我们通过推导遗憾上界为FedLinUCB-DW提供了理论保证。我们还在模拟环境和硬件原型系统上验证了该方法。实验评估表明，相比于最先进的基线，CANS实现了更低的推理延迟。特别地，在两项边缘设备的原型实验中，与无协作基线相比，所提出的CANS将平均推理延迟降低了高达50%。