The superior performance of Deep Neural Networks (DNNs) has led to their application in various aspects of human life. Safety-critical applications are no exception and impose rigorous reliability requirements on DNNs. Quantized Neural Networks (QNNs) have emerged to tackle the complexity of DNN accelerators, however, they are more prone to reliability issues. In this paper, a recent analytical resilience assessment method is adapted for QNNs to identify critical neurons based on a Neuron Vulnerability Factor (NVF). Thereafter, a novel method for splitting the critical neurons is proposed that enables the design of a Lightweight Correction Unit (LCU) in the accelerator without redesigning its computational part. The method is validated by experiments on different QNNs and datasets. The results demonstrate that the proposed method for correcting the faults has a twice smaller overhead than a selective Triple Modular Redundancy (TMR) while achieving a similar level of fault resiliency.
翻译:深度神经网络(DNNs)的卓越性能使其广泛应用于人类生活的各个方面。安全关键型应用也不例外,对DNNs提出了严格的可靠性要求。量化神经网络(QNNs)应运而生以应对DNN加速器的复杂性,然而这类网络更易出现可靠性问题。本文针对QNNs采用了一种新型分析性弹性评估方法,通过神经元脆弱性因子(NVF)识别关键神经元。进而提出一种新型关键神经元分裂方法,可在不重新设计加速器计算部分的情况下设计轻量级校正单元(LCU)。该方法通过不同QNNs及数据集的实验得到验证。结果表明,所提出的故障校正方法在实现相似容错性能的同时,其开销仅为选择性三模冗余(TMR)方法的一半。