Deep neural networks (DNN) have become significant applications in both cloud-server and edge devices. Meanwhile, the growing number of DNNs on those platforms raises the need to execute multiple DNNs on the same device. This paper proposes a dynamic partitioning algorithm to perform concurrent processing of multiple DNNs on a systolic-array-based accelerator. Sharing an accelerator's storage and processing resources across multiple DNNs increases resource utilization and reduces computation time and energy consumption. To this end, we propose a partitioned weight stationary dataflow with a minor modification in the logic of the processing element. We evaluate the energy consumption and computation time with both heavy and light workloads. Simulation results show a 35% and 62% improvement in energy consumption and 56% and 44% in computation time under heavy and light workloads, respectively, compared with single tenancy.
翻译:深度神经网络(DNN)已成为云服务器和边缘设备中的重要应用。同时,这些平台上DNN数量的持续增长,对在同一设备上执行多个DNN提出了需求。本文提出一种动态划分算法,以在基于脉动阵列的加速器上实现多个DNN的并发处理。在多个DNN间共享加速器的存储与处理资源,可提高资源利用率,并降低计算时间与能耗。为此,我们通过对处理单元逻辑进行小幅修改,提出了一种带划分的权值固定数据流。我们在重负载与轻负载场景下评估了能耗与计算时间。仿真结果表明,与单租户情况相比,重负载与轻负载下的能耗分别降低了35%和62%,计算时间分别减少了56%和44%。