Memory-aware network scheduling is becoming increasingly important for deep neural network (DNN) inference on resource-constrained devices. However, due to the complex cell-level and network-level topologies, memory-aware scheduling becomes very challenging. While previous algorithms all suffer from poor scalability, in this paper, we propose an efficient memory-aware scheduling framework based on iterative computation graph optimization. Our framework features an iterative graph fusion algorithm that simplifies the computation graph while preserving the scheduling optimality. We further propose an integer linear programming formulation together with topology-aware variable pruning to schedule the simplified graph efficiently. We evaluate our method against prior-art algorithms on different networks and demonstrate that our method outperforms existing techniques in all the benchmarks, reducing the peak memory footprint by 13.4%, and achieving better scalability for networks with complex network-level topologies.
翻译:忆知网络调度对资源受限设备上的深度神经网络推理至关重要。然而,由于复杂细胞级和网络级拓扑的存在,忆知调度变得极具挑战性。针对现有算法普遍存在的扩展性差的问题,本文提出了一种基于迭代计算图优化的高效忆知调度框架。该框架通过创新的迭代图融合算法,在保留调度最优性的同时简化计算图。我们进一步提出结合拓扑感知变量剪枝的整数线性规划形式化方法,以高效调度简化后的计算图。在多网络基准测试中,我们的方法相较于现有算法表现更优:峰值内存占用降低13.4%,且在具有复杂网络级拓扑结构的网络上展现出更优的可扩展性。