DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling

DNN workloads can be scheduled onto DNN accelerators in many different ways: from layer-by-layer scheduling to cross-layer depth-first scheduling (a.k.a. layer fusion, or cascaded execution). This results in a very broad scheduling space, with each schedule leading to varying hardware (HW) costs in terms of energy and latency. To rapidly explore this vast space for a wide variety of hardware architectures, analytical cost models are crucial to estimate scheduling effects on the HW level. However, state-of-the-art cost models are lacking support for exploring the complete depth-first scheduling space, for instance focusing only on activations while ignoring weights, or modeling only DRAM accesses while overlooking on-chip data movements. These limitations prevent researchers from systematically and accurately understanding the depth-first scheduling space. After formalizing this design space, this work proposes a unified modeling framework, DeFiNES, for layer-by-layer and depth-first scheduling to fill in the gaps. DeFiNES enables analytically estimating the hardware cost for possible schedules in terms of both energy and latency, while considering data access at every memory level. This is done for each schedule and HW architecture under study by optimally choosing the active part of the memory hierarchy per unique combination of operand, layer, and feature map tile. The hardware costs are estimated, taking into account both data computation and data copy phases. The analytical cost model is validated against measured data from a taped-out depth-first DNN accelerator, DepFiN, showing good modeling accuracy at the end-to-end neural network level. A comparison with generalized state-of-the-art demonstrates up to 10X better solutions found with DeFiNES.

翻译：DNN工作负载可以以多种方式调度到DNN加速器上：从逐层调度到跨层深度优先调度（又称层融合或级联执行）。这导致了一个非常广泛的调度空间，每种调度在能耗和延迟方面产生不同的硬件成本。为了快速探索各类硬件架构下的广阔空间，分析成本模型对于估算调度在硬件层面的影响至关重要。然而，现有最优成本模型缺乏对完整深度优先调度空间的支持，例如仅关注激活值而忽略权重，或仅建模DRAM访问而忽视片上数据移动。这些局限阻碍了研究者系统而准确地理解深度优先调度空间。在形式化该设计空间后，本文提出统一建模框架DeFiNES，用于填补逐层调度与深度优先调度之间的空白。DeFiNES能够通过分析方式估算潜在调度方案在能耗和延迟两方面的硬件成本，同时考虑每个内存层级的数据访问。针对每个调度方案和待研究的硬件架构，通过为操作数、层和特征图块每个独特组合最优选择内存层次中的活跃部分来实现估算。硬件成本估计考虑了数据计算阶段和数据复制阶段。该分析成本模型基于已流片的深度优先DNN加速器DepFiN的实测数据进行了验证，在端到端神经网络层面展现出良好的建模精度。与广义现有最优方法的比较表明，使用DeFiNES可发现性能提升高达10倍的解决方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日