We present Antler, which exploits the affinity between all pairs of tasks in a multitask inference system to construct a compact graph representation of the task set and finds an optimal order of execution of the tasks such that the end-to-end time and energy cost of inference is reduced while the accuracy remains similar to the state-of-the-art. The design of Antler is based on two observations: first, tasks running on the same platform shows affinity, which is leveraged to find a compact graph representation of the tasks that helps avoid unnecessary computations of overlapping subtasks in the task set; and second, tasks that run on the same system may have dependencies, which is leveraged to find an optimal ordering of the tasks that helps avoid unnecessary computations of the dependent tasks or the remaining portion of a task. We implement two systems: a 16-bit TI MSP430FR5994-based custom-designed ultra-low-power system, and a 32-bit ARM Cortex M4/M7-based off-the-shelf STM32H747 board. We conduct both dataset-driven experiments as well as real-world deployments with these systems. We observe that Antler's execution time and energy consumption are the lowest compared to all baseline systems and by leveraging the similarity of tasks and by reusing the intermediate results from previous task, Antler reduces the inference time by 2.3X -- 4.6X and saves 56\% -- 78\% energy, when compared to the state-of-the-art.
翻译:我们提出Antler系统,通过挖掘多任务推理系统中所有任务对之间的亲和性,构建任务集的紧凑图表示,并找到任务执行的最优顺序,从而在保持与当前最优方法相近精度的同时,降低推理的端到端时间和能耗成本。Antler的设计基于两个观察:第一,运行在相同平台上的任务表现出亲和性,利用这点可找到任务的紧凑图表示,从而避免任务集中重叠子任务的不必要计算;第二,运行在同一系统上的任务可能存在依赖关系,利用这点可找到任务的最优排序,从而避免依赖任务或任务剩余部分的不必要计算。我们实现了两个系统:基于16位TI MSP430FR5994定制的超低功耗系统,以及基于32位ARM Cortex M4/M7的商用STM32H747开发板。我们使用这些系统进行了数据集驱动实验和实际部署。观察发现,与所有基线系统相比,Antler的执行时间和能耗均为最低;与当前最优方法相比,通过利用任务相似性并重用先前任务的中间结果,Antler将推理时间降低2.3至4.6倍,节省56%至78%的能耗。