The growing exploitation of Machine Learning (ML) in safety-critical applications necessitates rigorous safety analysis. Hardware reliability assessment is a major concern with respect to measuring the level of safety in ML-based systems. Quantifying the reliability of emerging ML models, including Convolutional Neural Networks (CNNs), is highly complex due to their enormous size in terms of the number of parameters and computations. Conventionally, Fault Injection (FI) is applied to perform a reliability measurement. However, performing FI on modern-day CNNs is prohibitively time-consuming if an acceptable confidence level is to be achieved. To speed up FI for large CNNs, statistical FI (SFI) has been proposed, but its runtimes are still considerably long. In this work, we introduce DeepVigor+, a scalable, fast, and accurate semi-analytical method as an efficient alternative for reliability measurement in CNNs. DeepVigor+ implements a fault propagation analysis model and attempts to acquire Vulnerability Factors (VFs) as reliability metrics in an optimal way. The results indicate that DeepVigor+ obtains VFs for CNN models with an error less than $1\%$, i.e., the objective in SFI, but with $14.9$ up to $26.9$ times fewer simulations than the best-known state-of-the-art SFI. DeepVigor+ enables an accurate reliability analysis for large and deep CNNs within a few minutes, rather than achieving the same results in days or weeks.
翻译:随着机器学习(ML)在安全关键型应用中的日益广泛使用,对其进行严格的安全性分析变得至关重要。硬件可靠性评估是衡量基于ML的系统安全水平的一个主要关注点。量化包括卷积神经网络(CNN)在内的新兴ML模型的可靠性极其复杂,这源于其参数量和计算量的巨大规模。传统上,故障注入(FI)被用来执行可靠性测量。然而,若要在现代CNN上实现可接受的置信水平,进行FI耗时过长,令人望而却步。为了加速大型CNN的FI,统计故障注入(SFI)被提出,但其运行时间仍然相当长。在本工作中,我们提出了DeepVigor+,一种可扩展、快速且精确的半解析方法,作为CNN可靠性测量的高效替代方案。DeepVigor+实现了一个故障传播分析模型,并尝试以最优方式获取作为可靠性度量的脆弱性因子(VF)。结果表明,DeepVigor+为CNN模型获取的VF误差小于$1\%$(即SFI的目标),但所需的仿真次数比最先进的已知SFI方法减少了$14.9$至$26.9$倍。DeepVigor+能够在几分钟内完成对大型深度CNN的精确可靠性分析,而无需像传统方法那样花费数天或数周才能达到相同结果。