Computational offloading is a promising approach for overcoming resource constraints on client devices by moving some or all of an application's computations to remote servers. With the advent of specialized hardware accelerators, client devices can now perform fast local processing of specific tasks, such as machine learning inference, reducing the need for offloading computations. However, edge servers with accelerators also offer faster processing for offloaded tasks than was previously possible. In this paper, we present an analytic and experimental comparison of on-device processing and edge offloading for a range of accelerator, network, multi-tenant, and application workload scenarios, with the goal of understanding when to use local on-device processing and when to offload computations. We present models that leverage analytical queuing results to derive explainable closed-form equations for the expected end-to-end latencies of both strategies, which yield precise, quantitative performance crossover predictions that guide adaptive offloading. We experimentally validate our models across a range of scenarios and show that they achieve a mean absolute percentage error of 2.2% compared to observed latencies. We further use our models to develop a resource manager for adaptive offloading and show its effectiveness under variable network conditions and dynamic multi-tenant edge settings.
翻译:计算卸载是一种通过将应用程序的部分或全部计算任务迁移至远程服务器来克服客户端设备资源限制的有效方法。随着专用硬件加速器的出现,客户端设备现在能够对特定任务(如机器学习推理)进行快速的本地处理,从而减少对计算卸载的需求。然而,配备加速器的边缘服务器也为卸载任务提供了比以往更快的处理能力。本文通过分析和实验,比较了设备内处理与边缘卸载在多种加速器、网络、多租户及应用负载场景下的表现,旨在明确何时应采用本地设备内处理,何时应进行卸载计算。我们提出了利用解析排队论结果的模型,推导出两种策略预期端到端延迟的可解释闭式方程,这些方程能够产生精确的、定量的性能交叉预测,从而指导自适应卸载决策。我们通过一系列实验验证了所提模型,结果显示其预测延迟与实测值相比的平均绝对百分比误差为2.2%。进一步地,我们基于模型开发了一个用于自适应卸载的资源管理器,并展示了其在可变网络条件和动态多租户边缘环境下的有效性。