Despite deep learning's transformative impact on various domains, the reliability of Deep Neural Networks (DNNs) is still a pressing concern due to their complexity and data dependency. Traditional software fault localization techniques, such as Spectrum-based Fault Localization (SBFL), have been adapted to DNNs with limited success. Existing methods like DeepFault utilize SBFL measures but fail to account for fault propagation across neural pathways, leading to suboptimal fault detection. Addressing this gap, we propose the NP-SBFL method, leveraging Layer-wise Relevance Propagation (LRP) to identify and verify critical neural pathways. Our innovative multi-stage gradient ascent (MGA) technique, an extension of gradient ascent (GA), activates neurons sequentially, enhancing fault detection efficacy. We evaluated the effectiveness of our method, i.e. NP-SBFL-MGA, on two commonly used datasets, MNIST and CIFAR-10, two baselines DeepFault and NP- SBFL-GA, and three suspicious neuron measures, Tarantula, Ochiai, and Barinel. The empirical results showed that NP-SBFL-MGA is statistically more effective than the baselines at identifying suspicious paths and synthesizing adversarial inputs. Particularly, Tarantula on NP-SBFL-MGA had the highest fault detection rate at 96.75%, surpassing DeepFault on Ochiai (89.90%) and NP-SBFL-GA on Ochiai (60.61%). Our approach also yielded results comparable to those of the baselines in synthesizing naturalness inputs, and we found a positive correlation between the coverage of critical paths and the number of failed tests in DNN fault localization.
翻译:尽管深度学习已在多个领域产生变革性影响,但深度神经网络(DNNs)因其复杂性和数据依赖性,其可靠性仍是亟待解决的问题。传统的软件故障定位技术(如基于频谱的故障定位(SBFL))虽已尝试应用于DNNs,但成效有限。现有方法如DeepFault虽采用了SBFL度量,却未能考虑故障在神经路径间的传播,导致故障检测效果欠佳。为弥补这一不足,我们提出了NP-SBFL方法,该方法利用分层相关性传播(LRP)来识别和验证关键神经路径。我们创新的多阶段梯度上升(MGA)技术作为梯度上升(GA)的扩展,通过顺序激活神经元,提升了故障检测效能。我们在两个常用数据集MNIST和CIFAR-10上,以DeepFault和NP-SBFL-GA为基线,并采用Tarantula、Ochiai和Barinel三种可疑神经元度量,评估了NP-SBFL-MGA方法的有效性。实验结果表明,在识别可疑路径和合成对抗性输入方面,NP-SBFL-MGA在统计上显著优于基线方法。特别是,NP-SBFL-MGA结合Tarantula度量取得了最高的故障检测率(96.75%),超越了DeepFault结合Ochiai(89.90%)以及NP-SBFL-GA结合Ochiai(60.61%)。此外,本方法在合成自然性输入方面取得了与基线相当的结果,并发现关键路径覆盖率与DNN故障定位中的失败测试数量呈正相关。