Deep Neural Networks (DNNs) have emerged as the most effective programming paradigm for computer vision and natural language processing applications. With the rapid development of DNNs, efficient hardware architectures for deploying DNN-based applications on edge devices have been extensively studied. Emerging Non-Volatile Memories (NVMs), with their better scalability, non-volatility and good read performance, are found to be promising candidates for deploying DNNs. However, despite the promise, emerging NVMs often suffer from reliability issues such as stuck-at faults, which decrease the chip yield/memory lifetime and severely impact the accuracy of DNNs. A stuck-at cell can be read but not reprogrammed, thus, stuck-at faults in NVMs may or may not result in errors depending on the data to be stored. By reducing the number of errors caused by stuck-at faults, the reliability of a DNN-based system can be enhanced. This paper proposes CRAFT, i.e., Criticality-Aware Fault-Tolerance Enhancement Techniques to enhance the reliability of NVM-based DNNs in the presence of stuck-at faults. A data block remapping technique is used to reduce the impact of stuck-at faults on DNNs accuracy. Additionally, by performing bit-level criticality analysis on various DNNs, the critical-bit positions in network parameters that can significantly impact the accuracy are identified. Based on this analysis, we propose an encoding method which effectively swaps the critical bit positions with that of non-critical bits when more errors (due to stuck-at faults) are present in the critical bits.
翻译:深度神经网络已成为计算机视觉和自然语言处理应用中最有效的编程范式。随着深度神经网络的快速发展,在边缘设备上部署基于深度神经网络应用的高效硬件架构已得到广泛研究。新兴非易失性存储器凭借其更好的可扩展性、非易失性和良好的读取性能,被认为是部署深度神经网络的理想选择。然而,尽管前景广阔,新兴非易失性存储器常面临诸如固定故障等可靠性问题,这些问题会降低芯片良率/存储器寿命,并严重影响深度神经网络的精度。固定型故障单元可被读取却无法重新编程,因此非易失性存储器中的固定型故障是否导致错误取决于待存储的数据。通过减少固定型故障导致的错误数量,可提升基于深度神经网络系统的可靠性。本文提出CRAFT(临界感知容错增强技术),以增强固定型故障场景下基于非易失性存储器深度神经网络的可靠性。采用数据块重映射技术降低固定型故障对深度神经网络精度的影响。此外,通过对各类深度神经网络执行位级临界性分析,识别出网络参数中可显著影响精度的关键位位置。基于此分析,我们提出一种编码方法,当关键位中出现较多固定型故障导致的错误时,该方法可有效交换关键位与非关键位的位位置。