Distributing the inference of convolutional neural network (CNN) to multiple mobile devices has been studied in recent years to achieve real-time inference without losing accuracy. However, how to map CNN to devices remains a challenge. On the one hand, scheduling the workload of state-of-the-art CNNs with multiple devices is NP-Hard because the structures of CNNs are directed acyclic graphs (DAG) rather than simple chains. On the other hand, distributing the inference workload suffers from expensive communication and unbalanced computation due to the wireless environment and heterogeneous devices. This paper presents PICO, a pipeline cooperation framework to accelerate the inference of versatile CNNs on diverse mobile devices. At its core, PICO features: (1) a generic graph partition algorithm that considers the characteristics of any given CNN and orchestrates it into a list of model pieces with suitable granularity, and (2) a many-to-many mapping algorithm that produces the best pipeline configuration for heterogeneous devices. In our experiment with 2 ~ 8 Raspberry-Pi devices, the throughput can be improved by 1.8 ~ 6.8x under different CPU frequencies.
翻译:摘要:近年来,将卷积神经网络(CNN)的推理分布到多个移动设备上以实现实时推理且不损失准确率的研究已取得进展。然而,如何将CNN映射到设备上仍是一项挑战。一方面,由于现代CNN的结构是有向无环图(DAG)而非简单链式结构,在多设备间调度其计算负载属于NP难问题。另一方面,受无线环境与异构设备影响,分布式推理面临通信开销高昂与计算负载不均的问题。本文提出PICO——一种面向多样化移动设备上通用CNN推理加速的流水线协作框架。其核心创新包括:(1) 通用图分割算法,能针对任意给定CNN的结构特性,将其编排为具有适当粒度的模型片段列表;(2) 多对多映射算法,可为异构设备生成最优流水线配置。在2至8台树莓派设备上的实验表明,不同CPU频率下系统吞吐量可提升1.8至6.8倍。