The stringent requirements for the Deep Neural Networks (DNNs) accelerator's reliability stand along with the need for reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Moreover, the growing demand for specialized DNN accelerators with tailored requirements, particularly for safety-critical applications, necessitates a comprehensive design space exploration to enable the development of efficient and robust accelerators that meet those requirements. Therefore, the trade-off between hardware performance, i.e. area and delay, and the reliability of the DNN accelerator implementation becomes critical and requires tools for analysis. This paper presents a comprehensive methodology for exploring and enabling a holistic assessment of the trilateral impact of quantization on model accuracy, activation fault reliability, and hardware efficiency. A fully automated framework is introduced that is capable of applying various quantization-aware techniques, fault injection, and hardware implementation, thus enabling the measurement of hardware parameters. Moreover, this paper proposes a novel lightweight protection technique integrated within the framework to ensure the dependable deployment of the final systolic-array-based FPGA implementation. The experiments on established benchmarks demonstrate the analysis flow and the profound implications of quantization on reliability, hardware performance, and network accuracy, particularly concerning the transient faults in the network's activations.
翻译:深度神经网络加速器在满足降低硬件平台计算负担(即降低能耗和执行时间、提升效率)的同时,对其可靠性的严苛要求并存。此外,面向安全关键应用等特定需求的专用DNN加速器需求日益增长,亟需全面的设计空间探索以开发满足这些要求的高效且鲁棒的加速器。因此,硬件性能(即面积与延迟)与DNN加速器实现可靠性之间的权衡变得至关重要,需要相应的分析工具。本文提出了一套综合方法论,用于探索并实现量化对模型精度、激活故障可靠性和硬件效率三方影响的全局评估。文中引入了一个全自动框架,该框架能够应用多种量化感知技术、故障注入及硬件实现,从而实现对硬件参数的测量。此外,本文提出了一种集成于框架内的新型轻量级保护技术,以确保最终基于脉动阵列的FPGA实现的可靠部署。在标准基准上的实验展示了分析流程,并揭示了量化对可靠性、硬件性能及网络精度的深远影响,尤其涉及网络激活中的瞬态故障。