As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory hierarchy framework tailored for per layer adaptive memory access patterns of DNNs. The hierarchy requests data on-demand from the off-chip memory to provide it to the accelerator's compute units. The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance. The framework is characterized by its configurability, allowing the creation of a tailored memory hierarchy with up to five levels. Furthermore, the framework incorporates an optional shift register as final level to increase the flexibility of the memory management process. A comprehensive loop-nest analysis of DNN layers shows that the framework can efficiently execute the access patterns of most loop unrolls. Synthesis results and a case study of the DNN accelerator UltraTrail indicate a possible reduction in chip area of up to 62.2% as smaller memory modules can be used. At the same time, the performance loss can be minimized to 2.4%.
翻译:随着机器学习应用的持续演进,对高效硬件加速器(特别是针对深度神经网络(DNNs)的定制化加速器)的需求愈发关键。本文提出一种可配置的存储层次框架,专门适配DNN各层自适应的内存访问模式。该层次结构通过按需从片外内存请求数据,为加速器的计算单元提供数据供应,旨在优化平衡最小化所需存储容量与维持高加速器性能之间的关系。该框架以可配置性为特征,支持创建最多五级的定制化存储层次。此外,框架引入可选移位寄存器作为末级,以增强内存管理过程的灵活性。针对DNN层的全面循环嵌套分析表明,该框架能高效执行大多数循环展开的访问模式。综合结果与DNN加速器UltraTrail的案例研究表明,由于可采用更小的存储模块,芯片面积最多可减少62.2%,同时性能损失可控制在2.4%以内。