With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g., the human defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.31 and precision of 0.23). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. Results indicate that \dfd is the most effective tool, achieving an average recall of 0.61 and precision of 0.41 on our benchmark.
翻译:随着深度神经网络(DNNs)的日益普及,对辅助开发人员进行DNN实现、测试与调试过程的工具需求也随之增长。目前已提出了多种方法,能够自动分析并定位被测DNN中的潜在故障。本研究评估并比较了现有的先进故障定位技术,这些技术基于对DNN的动态与静态分析进行操作。评估在一个基准测试集上进行,该集合包含从缺陷报告平台获取的真实故障以及由变异工具生成的故障模型。我们的研究结果表明,使用单一特定基准真值(例如人工定义的基准)来评估DNN故障定位工具会导致性能相当低下(最高平均召回率为0.31,精确率为0.23)。然而,当考虑针对给定故障DNN存在的等效替代补丁时,这些指标会有所提升。结果显示,\dfd 是最有效的工具,在我们的基准测试中实现了平均召回率0.61和精确率0.41。