DNN workloads can be scheduled onto DNN accelerators in many different ways: from layer-by-layer scheduling to cross-layer depth-first scheduling (a.k.a. layer fusion, or cascaded execution). This results in a very broad scheduling space, with each schedule leading to varying hardware (HW) costs in terms of energy and latency. To rapidly explore this vast space for a wide variety of hardware architectures, analytical cost models are crucial to estimate scheduling effects on the HW level. However, state-of-the-art cost models are lacking support for exploring the complete depth-first scheduling space, for instance focusing only on activations while ignoring weights, or modeling only DRAM accesses while overlooking on-chip data movements. These limitations prevent researchers from systematically and accurately understanding the depth-first scheduling space. After formalizing this design space, this work proposes a unified modeling framework, DeFiNES, for layer-by-layer and depth-first scheduling to fill in the gaps. DeFiNES enables analytically estimating the hardware cost for possible schedules in terms of both energy and latency, while considering data access at every memory level. This is done for each schedule and HW architecture under study by optimally choosing the active part of the memory hierarchy per unique combination of operand, layer, and feature map tile. The hardware costs are estimated, taking into account both data computation and data copy phases. The analytical cost model is validated against measured data from a taped-out depth-first DNN accelerator, DepFiN, showing good modeling accuracy at the end-to-end neural network level. A comparison with generalized state-of-the-art demonstrates up to 10X better solutions found with DeFiNES.
翻译:DNN工作负载可以以多种方式调度到DNN加速器上:从逐层调度到跨层深度优先调度(又称层融合或级联执行)。这导致了一个非常广泛的调度空间,每种调度在能耗和延迟方面产生不同的硬件成本。为了快速探索各类硬件架构下的广阔空间,分析成本模型对于估算调度在硬件层面的影响至关重要。然而,现有最优成本模型缺乏对完整深度优先调度空间的支持,例如仅关注激活值而忽略权重,或仅建模DRAM访问而忽视片上数据移动。这些局限阻碍了研究者系统而准确地理解深度优先调度空间。在形式化该设计空间后,本文提出统一建模框架DeFiNES,用于填补逐层调度与深度优先调度之间的空白。DeFiNES能够通过分析方式估算潜在调度方案在能耗和延迟两方面的硬件成本,同时考虑每个内存层级的数据访问。针对每个调度方案和待研究的硬件架构,通过为操作数、层和特征图块每个独特组合最优选择内存层次中的活跃部分来实现估算。硬件成本估计考虑了数据计算阶段和数据复制阶段。该分析成本模型基于已流片的深度优先DNN加速器DepFiN的实测数据进行了验证,在端到端神经网络层面展现出良好的建模精度。与广义现有最优方法的比较表明,使用DeFiNES可发现性能提升高达10倍的解决方案。