To ensure resilient neural network processing on even unreliable hardware, comprehensive reliability analysis against various hardware faults is generally required before the deep neural network models are deployed, and efficient error injection tools are highly demanded. However, most existing fault injection tools remain rather limited to basic fault injection to neurons and fail to provide fine-grained vulnerability analysis capability. In addition, many of the fault injection tools still need to change the neural network models and make the fault injection closely coupled with normal neural network processing, which further complicates the use of the fault injection tools and slows down the fault simulation. In this work, we propose MRFI, a highly configurable multi-resolution fault injection tool for deep neural networks. It enables users to modify an independent fault configuration file rather than neural network models for the fault injection and vulnerability analysis. Particularly, it integrates extensive fault analysis functionalities from different perspectives and enables multi-resolution investigation of the vulnerability of neural networks. In addition, it does not modify the major neural network computing framework of PyTorch. Hence, it allows parallel processing on GPUs naturally and exhibits fast fault simulation according to our experiments.
翻译:为确保深度神经网络模型在不可靠硬件上仍能实现弹性处理,通常在模型部署前需针对各类硬件故障开展全面的可靠性分析,因此高效错误注入工具的需求日益迫切。然而,现有大部分故障注入工具仍局限于对神经元的基础故障注入功能,缺乏细粒度的脆弱性分析能力。此外,许多故障注入工具仍需修改神经网络模型,导致故障注入与常规神经网络处理紧密耦合,进一步增加了工具使用复杂度并降低了故障仿真速度。本研究提出MRFI——一种面向深度神经网络的高度可配置多分辨率故障注入工具。该工具允许用户通过独立配置故障配置文件(而非修改神经网络模型)实现故障注入与脆弱性分析。特别地,它集成了多维度扩展故障分析功能,支持神经网络脆弱性的多分辨率研究。同时,其核心机制无需改动PyTorch主神经网络计算框架,因此可自然支持GPU并行处理。实验表明,该工具具备高效的故障仿真能力。