Distributing the inference of convolutional neural network (CNN) to multiple mobile devices has been studied in recent years to achieve real-time inference without losing accuracy. However, how to map CNN to devices remains a challenge. On the one hand, scheduling the workload of state-of-the-art CNNs with multiple devices is NP-Hard because the structures of CNNs are directed acyclic graphs (DAG) rather than simple chains. On the other hand, distributing the inference workload suffers from expensive communication and unbalanced computation due to the wireless environment and heterogeneous devices. This paper presents PICO, a pipeline cooperation framework to accelerate the inference of versatile CNNs on diverse mobile devices. At its core, PICO features: (1) a generic graph partition algorithm that considers the characteristics of any given CNN and orchestrates it into a list of model pieces with suitable granularity, and (2) a many-to-many mapping algorithm that produces the best pipeline configuration for heterogeneous devices. In our experiment with 2 ~ 8 Raspberry-Pi devices, the throughput can be improved by 1.8 ~ 6.8x under different CPU frequencies.
翻译:近年来,将卷积神经网络(CNN)的推理任务分布到多个移动设备上的研究受到关注,旨在实现实时推理且不损失精度。然而,如何将CNN映射到设备仍是一项挑战。一方面,由于现代CNN的结构为有向无环图(DAG)而非简单链式结构,在多个设备上调度其计算负载属于NP难问题;另一方面,受无线环境及异构设备影响,分布式推理面临通信开销高昂与计算负载不均衡的困境。本文提出PICO——一种流水线协作框架,用于加速多样化移动设备上通用CNN的推理。其核心特性包括:(1)一种通用图分割算法,该算法考虑任意给定CNN的特性,将其编排为若干粒度适中的模型片段;(2)一种多对多映射算法,可为异构设备生成最优流水线配置。在2~8台树莓派设备上的实验表明,不同CPU频率下吞吐量可提升1.8~6.8倍。