The deployment of ML models on edge devices is challenged by limited computational resources and energy availability. While split computing enables the decomposition of large neural networks (NNs) and allows partial computation on both edge and cloud devices, identifying the most suitable split layer and hardware configurations is a non-trivial task. This process is in fact hindered by the large configuration space, the non-linear dependencies between software and hardware parameters, the heterogeneous hardware and energy characteristics, and the dynamic workload conditions. To overcome this challenge, we propose DynaSplit, a two-phase framework that dynamically configures parameters across both software (i.e., split layer) and hardware (e.g., accelerator usage, CPU frequency). During the Offline Phase, we solve a multi-objective optimization problem with a meta-heuristic approach to discover optimal settings. During the Online Phase, a scheduling algorithm identifies the most suitable settings for an incoming inference request and configures the system accordingly. We evaluate DynaSplit using popular pre-trained NNs on a real-world testbed. Experimental results show a reduction in energy consumption up to 72% compared to cloud-only computation, while meeting ~90% of user request's latency threshold compared to baselines.
翻译:在边缘设备上部署机器学习模型面临计算资源和能源可用性有限的双重挑战。虽然拆分计算能够分解大型神经网络,并允许在边缘和云设备上进行部分计算,但确定最合适的拆分层和硬件配置并非易事。这一过程实际上受到以下因素的阻碍:庞大的配置空间、软件与硬件参数间的非线性依赖关系、异构的硬件与能耗特性,以及动态的工作负载条件。为克服这一挑战,我们提出了DynaSplit——一个通过动态配置软件(即拆分层)和硬件(如加速器使用、CPU频率)参数的两阶段框架。在离线阶段,我们采用元启发式方法求解多目标优化问题以发现最优配置。在线阶段,调度算法为即将到来的推理请求识别最合适的配置,并据此配置系统。我们在真实测试平台上使用流行的预训练神经网络对DynaSplit进行评估。实验结果表明,与纯云端计算相比,DynaSplit在满足用户请求约90%延迟阈值(相较于基线)的同时,能耗降低最高可达72%。