Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance. In this paper, we propose Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 21.2\%--84.9\% and 17.9\%--96.2\% , respectively. Besides, Xenos also outperforms the widely-used TVM by 3.22$\times$--17.92$\times$. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68x--3.78x compared with the single device.
翻译:边缘计算已成为模型推理的典型应用场景。然而,在边缘设备(如多核DSP、FPGA等)上进行推理时,由于缺乏高度优化的推理框架,其性能往往不尽如人意。现有模型推理框架主要采用算子中心化开发方式,难以对边缘端推理提供充分加速。此外,算子中心化框架还会带来持续的开发和维护成本。本文提出Xenos,该框架能够自动对计算图进行面向数据流的优化,并在两个维度上加速推理进程:在纵向维度上,Xenos通过算子链接技术重构算子间数据流,提升数据局部性;在横向维度上,Xenos开发了DSP感知的算子拆分技术,以在多DSP单元间实现更高并行度。实验证明,纵向和横向数据流优化分别能将推理时间降低21.2%—84.9%和17.9%—96.2%。同时,Xenos的性能较广泛使用的TVM框架提升3.22倍—17.92倍。此外,我们将Xenos扩展为分布式解决方案d-Xenos,该方案通过协同多个边缘设备联合执行推理任务,与单设备方案相比可实现3.68倍—3.78倍的加速效果。