Deep neural networks (DNNs) have substantial computational and memory requirements, and the compilation of its computational graphs has a great impact on the performance of resource-constrained (e.g., computation, I/O, and memory-bound) edge computing systems. While efficient execution of their computational graph requires an effective scheduling algorithm, generating the optimal scheduling solution is a challenging NP-hard problem. Furthermore, the complexity of scheduling DNN computational graphs will further increase on pipelined multi-core systems considering memory communication cost, as well as the increasing size of DNNs. Using the synthetic graph for the training dataset, this work presents a reinforcement learning (RL) based scheduling framework RESPECT, which learns the behaviors of optimal optimization algorithms and generates near-optimal scheduling results with short solving runtime overhead. Our framework has demonstrated up to $\sim2.5\times$ real-world on-chip inference runtime speedups over the commercial compiler with ten popular ImageNet models deployed on the physical Coral Edge TPUs system. Moreover, compared to the exact optimization methods, the proposed RL scheduling improves the scheduling optimization runtime by up to 683$\times$ speedups compared to the commercial compiler and matches the exact optimal solutions with up to 930$\times$ speedups. Finally, we perform a comprehensive generalizability test, which demonstrates RESPECT successfully imitates optimal solving behaviors from small synthetic graphs to large real-world DNNs computational graphs.
翻译:深度神经网络(DNNs)具有显著的计算和内存需求,其计算图的编译方式对资源受限(如计算、输入输出与内存受限)的边缘计算系统性能影响重大。高效执行计算图需要有效的调度算法,但生成最优调度方案是一个具有挑战性的NP难问题。此外,在考虑内存通信开销的流水线多核系统上,随着DNN规模不断增大,调度DNN计算图的复杂度将进一步增加。本工作提出基于强化学习的调度框架RESPECT,利用合成图作为训练数据集,学习最优优化算法的决策规律,以短求解运行时开销生成近最优调度结果。在部署了十种流行ImageNet模型的物理Coral Edge TPU系统上,该框架相比商业编译器实现了高达约2.5倍的实际片上推理运行时加速。此外,与精确优化方法相比,所提出的强化学习调度在调度优化运行时上相比商业编译器实现了高达683倍加速,并在匹配精确最优解时取得高达930倍加速。最后,我们进行全面的泛化性测试,证明RESPECT成功地将小规模合成图的最优求解行为泛化至实际大规模DNN计算图中。