The recent advancements in small-size inference models facilitated AI deployment on the edge. However, the limited resource nature of edge devices poses new challenges especially for real-time applications. Deploying multiple inference models (or a single tunable model) varying in size and therefore accuracy and power consumption, in addition to an edge server inference model, can offer a dynamic system in which the allocation of inference models to inference jobs is performed according to the current resource conditions. Therefore, in this work, we tackle the problem of selectively allocating inference models to jobs or offloading them to the edge server to maximize inference accuracy under time and energy constraints. This problem is shown to be an instance of the unbounded multidimensional knapsack problem which is considered a strongly NP-hard problem. We propose a lightweight hybrid genetic algorithm (LGSTO) to solve this problem. We introduce a termination condition and neighborhood exploration techniques for faster evolution of populations. We compare LGSTO with the Naive and Dynamic programming solutions. In addition to classic genetic algorithms using different reproduction methods including NSGA-II, and finally we compare to other evolutionary methods such as Particle swarm optimization (PSO) and Ant colony optimization (ACO). Experiment results show that LGSTO performed 3 times faster than the fastest comparable schemes while producing schedules with higher average accuracy.
翻译:近期小尺寸推理模型的进步促进了人工智能在边缘设备上的部署。然而,边缘设备资源有限的特性给实时应用带来了新的挑战。部署多个尺寸不同(从而精度和功耗也不同)的推理模型(或单个可调模型),并配合边缘服务器推理模型,可以构建一个根据当前资源条件为推理任务动态分配推理模型的系统。因此,本文研究了在时间和能量约束下,如何为任务选择性分配推理模型或将其卸载至边缘服务器以最大化推理精度的问题。该问题被证明是强NP难的无界多维背包问题的一个实例。我们提出了一种轻量级混合遗传算法(LGSTO)来解决该问题,并引入终止条件和邻域探索技术以加速种群进化。我们将LGSTO与朴素解法和动态规划解法进行了比较,同时与采用不同繁殖方法(包括NSGA-II)的经典遗传算法,以及粒子群优化(PSO)和蚁群优化(ACO)等其他进化方法进行了对比。实验结果表明,LGSTO在执行速度上比最快的可比方案快3倍,同时生成的调度方案具有更高的平均精度。