Deep neural networks (DNNs) are increasingly used in safety-critical applications. Reliable fault analysis and mitigation are essential to ensure their functionality in harsh environments that contain high radiation levels. This study analyses the impact of multiple single-bit single-event upsets in DNNs by performing fault injection at the level of a DNN model. Additionally, a fault aware training (FAT) methodology is proposed that improves the DNNs' robustness to faults without any modification to the hardware. Experimental results show that the FAT methodology improves the tolerance to faults up to a factor 3.
翻译:深度神经网络(DNNs)在安全关键型应用中的使用日益广泛。为确保其在包含高辐射水平的恶劣环境下的功能,可靠的故障分析与缓解至关重要。本研究通过在DNN模型层面执行故障注入,分析了多重单比特单粒子翻转对DNNs的影响。此外,本文提出了一种故障感知训练(FAT)方法,该方法无需对硬件进行任何修改即可提升DNNs对故障的鲁棒性。实验结果表明,FAT方法可将故障容忍度提升高达3倍。